Progress in Mathematical Physics 
69 


Philippe Blanchard 


cou 
Mathematical 
Methods in 
Physics 


Operators, Variational Methods, and 
Applications in Quantum Physics 


Second Edition 


® Birkhauser 


® Birkhauser 


Progress in Mathematical Physics 


Volume 69 


Editors-in-chief 

Anne Boutet de Monvel, Université Paris VII UFR de Mathematiques, 
Paris CX 05, France 

Gerald Kaiser, Center for Signals and Waves, Portland, Oregon, USA 


Editorial Board 
C. Berenstein, University of Maryland, College Park, USA 
Sir M. Berry, University of Bristol, UK 
P. Blanchard, University of Bielefeld, Germany 
M. Eastwood, University of Adelaide, Australia 
A.S. Fokas, University of Cambridge, UK 
F. W. Hehl, University of Cologne, Germany 
and University of Missouri-Columbia, USA 
D. Sternheimer, Université de Bourgogne, Dijon, France 
C. Tracy, University of California, Davis, USA 


For further volumes: 
http://www.springer.com/series/48 13 


Philippe Blanchard ¢ Erwin Briining 


Mathematical Methods 
in Physics 


Distributions, Hilbert Space Operators, 
Variational Methods, and Applications 
in Quantum Physics 


Second Edition 


® Birkhauser 


Philippe Blanchard Erwin Briining 


Abt. Theoretische Physik School of Mathematics, Statistics, 
Universitat Bielefeld Fak. Physik and Computer Science 
Bielefeld University of KwaZulu-Natal 
Germany Durban 
South Africa 
ISSN 1544-9998 ISSN 2197-1846 (electronic) 
Progress in Mathematical Physics 
ISBN 978-3-319-14044-5 ISBN 978-3-319-14045-2 (eBook) 


DOI 10.1007/978-3-319-14045-2 


Library of Congress Control Number: 2015931210 
Mathematics Subject Classification (MSC): 46-01, 46C05, 46F05, 46N50, 47A05, 47L90, 49-01, 60E05, 
81Q10 


Springer Cham Heidelberg New York Dordrecht London 

© Springer International Publishing Switzerland 2003, 2015 

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the 
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, 
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information 
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology 
now known or hereafter developed. 

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication 
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant 
protective laws and regulations and therefore free for general use. 

The publisher, the authors and the editors are safe to assume that the advice and information in this book 
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the 
editors give a warranty, express or implied, with respect to the material contained herein or for any errors 
or omissions that may have been made. 


Printed on acid-free paper 


Springer is part of Springer Science+Business Media (www.springer.com) 


Dedicated to the memory of 
Yurko Vladimir Glaser and Res Jost, 
mentors and friends 


Preface to the Second Edition 


The first edition of this book was published in 2003. Let us first thank everyone 
who, over the past 11 years, has provided us with suggestions and corrections for 
improving this first edition. We are extremely grateful for this help. 

The past decade has brought many changes, but the aim of this book remains the 
same. It is intended for graduate students in physics and mathematics and it may also 
be useful for theoretical physicists in research and in industry. As the extended title 
of this second edition indicates, we have focused our attention to a large extent on 
topical applications to Quantum Physics. 

With the hope that this book would be a useful reference for people applying math- 
ematics in their work, we have emphasized the results that are important for various 
applications in the areas indicated above. This book is essentially self-contained. 
Perhaps some readers will use this book as a compendium of results; this would be a 
pity, however, because proofs are often as important as results. Mathematical physics 
is not a passive activity, and therefore the book contains more than 220 exercises to 
challenge readers and to facilitate their understanding. 

This second edition differs from the first through the reorganization of certain 
material and the addition of five new chapters which have a new range of substantial 
applications. 

The first addition is Chap. 13 “Sobolev spaces” which offers a brief introduction 
to the basic theory of these spaces and thus prepares their use in the study of linear 
and nonlinear partial differential operators and, in particular, in the third part of this 
book dedicated to “Variational Methods.” 

While in the first edition Hilbert-Schmidt and trace class operators were only 
discussed briefly in Chap. 22 “Special classes of bounded operators,” this edition 
contains a new Chap. 26 “Hilbert-Schmidt and trace class operators.” The remainder 
of Chap. 22 has been merged with the old, shorter Chap. 23 “Self-adjoint Hamilton 
operators” to form the new Chap. 23 “Special classes of linear operators,” which now 
also contains a brief section on von Neumann’s beautiful application of the spectral 
theory for unitary operators in ergodic theory. 

Chapter 26 “Hilbert-Schmidt and trace class operators” gives a fairly compre- 
hensive introduction to the theory of these operators. Furthermore, the dual spaces 
of the spaces of compact and of trace class operators are determined, which allows 


Vii 


Vili Preface to the Second Edition 


a thorough discussion of several locally convex topologies on the space B(H) of all 
bounded linear operators on a separable Hilbert space 11. These are used later in 
the chapter “Operator algebras and positive mappings” to characterize normal states 
on von Neumann algebras. This chapter also contains the definition of the partial 
trace of trace class operators on the tensor products of two separable infinite dimen- 
sional Hilbert spaces and studies its main properties. These results are of particular 
importance in the theory of open quantum systems and the theory of decoherence. 

The motivation for the new Chap. 29 “Spectral analysis in rigged Hilbert spaces” 
comes from the fact that, on one side, Dirac’s bra and ket formalism is widely and 
successfully employed in theoretical physics, but its mathematical foundation is not 
easily accessible. This chapter offers a nearly self-contained mathematical basis for 
this formalism and proves in particular the completeness? there is a word missing in 
this sentence completeness of the set of generalized eigenfunctions. 

In Chap. 30 “Operator algebras and positive mappings,” we study in detail pos- 
itive and completely positive mappings on the algebra B(H) of all bounded linear 
operators on a Hilbert space H respectively on its subalgebras. We explain the GNS 
construction for positive linear functionals in detail and characterize states, i.e., 
normalized positive linear functionals, in terms of their equivalent continuity proper- 
ties (normal, completely additive, tracial). Next, Stinespring’s factorization theorem 
characterizes completely positive maps in terms of representations. Since all repre- 
sentations of 6(H) are determined too, we can give a self-contained characterization 
of all completely positive mappings on B(H). 

The last new chapter, Chap. 31 “Positive mappings in quantum physics,” presents 
several results which are very important for the foundations of quantum physics and 
quantum information theory. We start with a detailed discussion of Gleason’s the- 
orem on the general form of countable additive probability measures on the set of 
projections of a separable Hilbert space. Using some of the results of the previous 
chapter, we then give a self-contained characterization of quantum operations specif- 
ically, quantum channel maps (Kraus form) and conclude with a brief discussion of 
the stronger form of these results if the underlying Hilbert space is finite dimensional 
(Choi’s characterization of quantum operations). 

On the basis of the mathematical results obtained in this and earlier chapters, it 
is straightforward to introduce some quite prominent concepts in quantum physics, 
namely open quantum systems, reduced dynamics, and decoherence. This is done in 
the last section of this chapter. 


Bielefeld and Durban Ph. Blanchard 
May 2014 E. Briining 


Preface 


Courses in modern theoretical physics have to assume some basic knowledge of 
the theory of generalized functions (in particular distributions) and of the theory 
of linear operators in Hilbert spaces. Accordingly, the faculty of physics of the 
University of Bielefeld offered a compulsory course Mathematische Methoden der 
Physik for students in the second semester of the second year, which now has been 
given for many years. This course has been offered by the authors over a period 
of about 10 years. The main goal of this course is to provide basic mathematical 
knowledge and skills as they are needed for modern courses in quantum mechanics, 
relativistic quantum field theory, and related areas. The regular repetitions of the 
course allowed, on the one hand, testing of a number of variations of the material 
and, on the other hand, the form of the presentation. From this course, the book 
Distributionen und Hilbertraumoperatoren. Mathematische Methoden der Physik. 
Springer-Verlag Wien, 1993 emerged. The present book is a translated, considerably 
revised, and extended version of this book. It contains much more than this course 
since we added many detailed proofs, many examples, and exercises as well as hints 
linking the mathematical concepts or results to the relevant physical concepts or 
theories. 

This book addresses students of physics who are interested in a conceptually 
and mathematically clear and precise understanding of physical problems, and it 
addresses students of mathematics who want to learn about physics as a source and 
as an area of application of mathematical theories, i.e., all those students with interest 
in the fascinating interaction between physics and mathematics. 

It is assumed that the reader has a solid background in analysis and linear algebra 
(in Bielefeld this means three semesters of analysis and two of linear algebra). On 
this basis the book starts in Part A with an introduction to basic linear functional 
analysis as needed for the Schwartz theory of distributions and continues in Part B 
with the particularities of Hilbert spaces and the core aspects of the theory of linear 
operators in Hilbert spaces. Part C develops the basic mathematical foundations for 
modern computations of the ground state energies and charge densities in atoms 
and molecules, i.e., basic aspects of the direct methods of the calculus of variations 
including constrained minimization. A powerful strategy for solving linear and non- 
linear boundary and eigenvalue problems, which covers the Dirichlet problem and 


ix 


x Preface 


its nonlinear generalizations, is presented as well. An appendix gives detailed proofs 
of the fundamental principles and results of functional analysis to the extent they are 
needed in our context. 

With great pleasure we would like to thank all those colleagues and friends who 
have contributed to this book through their advice and comments, in particular G. 
Bolz, J. Loviscach, G. Roepstorff, and J. Stubbe. Last but not least we thank the 
editorial team of Birkhauser—Boston for their professional work. 


Bielefeld and Durban Ph. Blanchard 
June 2002 E. Briining 


Acknowledgements 


It is with great pleasure that we thank our colleague and friend Shigeaki Nagamachi 
(Tokushima, Japan) for his help with the proof reading of the new chapters and a 
number of suggestions for improving our presentation. Our thanks go also to our 
colleagues Florian Scheck and Ludwig Streit and to many students. They helped us 
to eliminate quite a number of (small) errors. 

Last, but not least, we thank the editorial team of Birkhauser for their professional 
work. 


Bielefeld and Durban Ph. Blanchard 
May 2014 E. Briining 


xi 


Contents 


PartI Distributions 


1 © IntrodwctiOnh:s se 5 Sone an age tk es bee Ba Mae a ha Sata aes oe 3 
Reference s....5532. iste ae he ee tea dee ead 6 
2 Spaces of Test Functions ......... 0... cee 7 
2.1 Hausdorff Locally Convex Topological Vector Spaces ............ 7 
2.1.1 Examples of HLCTVS ...........0.......0..0..0000008. 13 
2.1.2 Continuity and Convergence ina HLCVTVS ............. 15 
2.2 Basic Test Function Spaces of Distribution Theory ............... 18 

2.2.1 The Test Function Space D(S2) of C° Functions of Compact 
SUPPOLl cSt Sa eee tn thee neal th ae een sau eee ee oI 18 

2.2.2 The Test Function Space S(2) of Strongly Decreasing 

C°-Functions 00-025. c cia a eee teas a ea ae ee 20 


2.2.3. The Test Function Space €(2) of AllC®-Functions on 2... 21 
2.2.4 Relation Between the Test Function Spaces D({2), S(2), and 


CD) orrtiges Savec one setisnare aber thong hele Bos enane Pate art dbs 21 

23° VEXELCISS: 2Sowhadeterttae went Sart taisd dees tee eee 22 
IRETETENCEs eos, eid eitbtct hdl satartale Wes Miele Peo PALS ROSS 6 24 
3 Schwartz Distributions ......... 0... eee 25 
3.1 The Topological Dual of an HLCTVS...................00.000. 25 
3.2 Definition of Distributions ....................0. 00.2 27 
3.2.1 The Regular Distributions .........................00.. 29 

3.2.2 Some Standard Examples of Distributions................ 31 

3.3. Convergence of Sequences and Series of Distributions............ 33 
3.4 Localization of Distributions ........................00 0.00004 38 
3.5. Tempered Distributions and Distributions with Compact Support... 40 
3:6> EX€LCiS@S: sap esd ea ee ROA Re ae ee eae 42 


xiii 


Contents 
Calculus for Distributions ...........0.0.000 00000 c ccc cee eens 45 
4.1 Differentiation ........ 0.0... tn teenies 46 
4:2. “Multiplication: 5 v4s4.35000.-ehhaud ton oidee Cee tae oe ee eet 49 
4.3 Transformation of Variables ............. 0.00 ccc cece eens 52 
4.4 Some Applications .......... 0... eee eee ee 55 
4.4.1 Distributions with Support ina Point .................... 55 
4.4.2 Renormalization of (+) o— ou) eo duaigatie Dotash dye mot etre teal 57 
4:5: sEXeErclSes:: vhs oaeatch cad acetates aeisiacs Sto ce Ate teed Shae dos 59 
References s. -cc.c:37. 2 hie es AeA eed teehee Ri te eae Uae s 60 
Distributions as Derivatives 
Of FUNCHONS 2.80 Boa Cian aS Re ew GW Cea ee ee ee 63 
Sel > Weak: Derivativesiecs : «scent Heuiges wd icte dgoeted by Ux ees te ees 63 
5.2 Structure Theorem for Distributions .................0 00000 e eee 65 
5.3. Radon Measures ......... 0... cece cece eet e ene n eee 67 
5.4 The Case of Tempered and Compactly Supported Distributions .... 69 
DD. CEXCLCISES © osc. tie dete clk eee ies Hawes aes eutima tan tide h das} 71 
References) 22.3% 2S ee ASS ta ah no a ed 71 
‘Fensor Products 20 3..060:.4- 26 cea diee ieee soon hates ohn Ee Re wane 73 
6.1 Tensor Product for Test Function Spaces ....................04. 73 
6.2 Tensor Product for Distributions ................ 00 cee cee ee 77 
6:3° EXe@rcises:. ss.se:tc use cies wale oon a aide eid be ee atts 84 
Referen Ce iets ogee f cone Sahel Sos eae Meade natant wake a eat eye eects 84 
Convolution Products............0 0.00. c ccc cece tenet ennes 85 
7.1 Convolution of Functions ........... 0.0.00 c ccc ee eee 85 
7.2 Regularization of Distributions ......................00..0000. 89 
7.3 Convolution of Distributions ............. 00.0 c cece 93 
TA. EX@Ircises)- iv.0.4-h.0cp cays thon ee ee edd eR A pa ee 100 
ReferenCesic2 vist ox f eaenticks cate ed AOS Chea tise teed Sa, cee ere 100 
Applications of Convolution ............. 0.00. c cece eee eee eens 101 
8.1 Symbolic Calculus—Ordinary Linear Differential Equations ...... 102 
8.2 Integral Equation of Volterra ........... 0.0... eee eee eee 106 
8.3 Linear Partial Differential Equations with Constant Coefficients.... 107 
8.4 Elementary Solutions of Partial Differential Operators............ 110 
8.4.1 The Laplace Operator A, = 7", a 1h RY escce vedios 111 
8.4.2 The PDE Operator 2 — Ay, of the Heat Equation in R"*!... 112 
8.4.3 The Wave Operator 04 = ae SAS i RO ear een b toes 114 
85s, EXENCIS@S ics ays eceeue s hiye a eda Rae $e Bg Aachen ec OS eYanndaed ee OS 117 


RefeETenCeS) in. jcc t'e ewte b dehare eSargbuecery Se ba ge teed Sues Be he Ee haw aneee 117 


Contents XV 


9 Holomorphic Functions ........... 0.0... eee 119 
O11 ~ Hypodllipticity OF 0 i2cissviee. thee th ve wohaw tee he wdede fen ggee 119 

9.2. “Cauchy Theory® ¢ 3.4 ucs back adad Pes etd Tea ee ee 122 

9.3. Some Properties of Holomorphic Functions................... 125 

QA. AEXCICISS | isgte-d nis the Sa eenGhany Aue weads Bek yyeute ews 131 
References asia faswn Bae Gee eeed faeee awed eh apes eG las 131 

10 Fourier Transformation. ............... 00... cece eee eee ees 133 
10.1 Fourier Transformation for Integrable Functions............... 134 

10.2 Fourier Transformation on SCR") 2.0.0.0... ccc eee eens 141 

10.3 Fourier Transformation for Tempered Distributions ............ 144 

10.4 Some Applications ............ 0... eee ee eee 153 
10.4.1 Examples of Tempered Elementary Solutions.......... 155 

10.4.2 Summary of Properties of the Fourier Transformation... 159 

LO.3© (EXCIGISES: )s.t2.c06.s. adage Ses aie aialy tad ea alee aee dad Seaualy vee 160 
RGfErenCes 5.5 io cccay tetep gangs ee eee hee eee Oe dba dae oe a 162 

11 Distributions as Boundary Values of Analytic Functions ........... 163 
Del. ERCP CISOS jad tang toe cae ree Selene ees ve eae ee ape 167 
References «4522s eee dare ebb eiceeg Shine ae ne wa Ra ete od ate Bee aes 168 

12 Other Spaces of Generalized Functions .......................... 169 
12.1 Generalized Functions of Gelfand Type S .................... 170 

12.2 Hyperfunctions and Fourier Hyperfunctions .................. 173 

12.3 Ultradistributions.............. 0... 0 eee eee eee 177 
References: 4.2) ois.0: bor) as shee Se ean oR a cleits die Bo Sie ees od 178 

13> Sobolev Spaces is::s35. 2.05.05 wowed ds eeioy ded as Se eh ee ee 181 
1331. “Motivations. .s.2.4 ewe ots bea Sere Pe cobain eitie eo tie Hae ogee 181 

13.2 Basic Definitions ............. 0... 2c ee ee eee eee 181 

13.3. The: Basic Estimates! siccci% nc iiset tees Peete Ae Le es 184 
13.3.1 Morrey’s Inequality ..................0.00.000 0000. 184 

13.3.2 Gagliardo-Nirenberg-Sobolev Inequality.............. 188 

13.4 Embeddings of Sobolev Spaces ....................00 0000005 193 
13.4.1 Continuous Embeddings .......................004. 193 

13.4.2 Compact Embeddings .......................0000.. 195 

135; EXCICIS€S) sso ten. 5 taeda ne Sede Sen ale Gee Se Pees 198 
RGfELENCES «5s 4s a io hdl st gncleceeg She Phen hai pale Shade Ee wees 198 


Part II Hilbert Space Operators 


14 Hilbert Spaces: A Brief Historical Introduction ................... 201 
14.1 Survey: Hilbert Spaces ......... 0.0... 000s 201 
14.2 Some Historical Remarks................. 0.0... cece eee eee 208 
14.3 Hilbert Spaces and Physics ................ 0.0... e eee eee 210 


References .in 2-2 sise ire dad eno Sete Bowe PRS we aed Da eee SR GR 211 


15 


16 


17 


18 


19 


20 


Contents 
Inner Product Spaces and Hilbert Spaces ........................ 213 
15.1 Inner Product Spaces ............. 0. eee eee eee 213 
15.1.1. Basic Definitions and Results ....................... 214 
15.1.2 Basic Topological Concepts .....................04. 218 

15.1.3 On the Relation Between Normed Spaces and Inner 
Product spaces sss.idss:cg2s6a 28 cad ele oogenesis 219 
15.1.4 Examples of Hilbert Spaces .....................0.. 221 
15:2* -EX@ICls€S iis hit acvis bse Siedials athe eisia ees Sh ited 224 
RGfErences':. 3.2% s-cveehs ad Aiptek bs RA ie cee Kee shed dee od Gees 225 
Geometry of Hilbert Spaces ............. 0... cece cece ees 227 
16.1 Orthogonal Complements and Projections .................... 227 
16.2 Gram Determinants.............. 0.0... c cee ee eee eee 231 
16.3. The Dual of a Hilbert Space................ 00... 233 
16:4: “EXCICISeS: o/5.2-cacdu.a te hehe Esashi dee tee Ate ceded Ben (en Sete 237 
Separable Hilbert Spaces .............. 00... cece eee eee ees 239 
17-1. Basic: Facts:.-2.4-.2: choise bie ieee owed Oude da eye eee 239 
17.2. Weight Functions and Orthogonal Polynomials................ 245 
17.3. Examples of Complete Orthonormal Systems for L?([, pdx) .... 249 
17-4. Exercises: cc. decane aa bewss bed dh bee edte paneeb ew eas 253 
Re@feren Ces: -2 24nd Bee oe oo He a ee 254 
Direct Sums and Tensor Products ............. 0.0... sees eee eee ee 255 
18.1 Direct Sums of Hilbert Spaces....................0.0020000- 255 
18.2. Tensor Products. i. cc 505 oes Gg ees Mea dee bee She sae os 258 
18.3. Some Applications of Tensor Products and Direct Sums ........ 261 
18.3.1 State Space of Particles with Spin ................... 261 
18.3.2 State Space of Multi Particle Quantum Systems........ 261 
18,4 SEXCIrCIs€S seat hti aden seed See aes Ie ht See es 262 
References: isanrd ox Lead eect < Gakid AG Phin eines ae a gan y ears 263 
Topological Aspects. ........... 0.0... 265 
190) “COMPACtheSS: so c.icc bce he Poa d How b eg tateb tap Greve odd ne 265 
19.2 The Weak Topology ........... 0... eee cece eee eee 267 
1953>  EX€NCIS€S) 300 Ararceied Qed dae earns ale Dae eae eee ae 275 
RGference init wien kf ces ae Basins singe Seno Gaia, Fale Geter Sere maps 276 
Linear Operators ......... 0... ee 277 
2OA. UBasiC: Bacts'. i ise heats ane kee ad Ao Meade eae eee ae eae aa« 277 
20.2 Adjoints, Closed and Closable Operators ..................... 280 
20.3. Symmetric and Self-Adjoint Operators....................0.. 286 
20:4. Examples eiit.5, chet d Pete tb tide eee tdedeee vod ias 289 
20.4.1 Operator of Multiplication......................00.0. 289 
20.4.2 Momentum Operator ....................000 eee eee 290 
20.4.3. Free Hamilton Operator......................0.004. 291 


D02D.< SEXCECISES: santo ceensediatargesnsvana wnced cit gatbloneodtea annem ba bee e EIS San etter 292 


Contents Xvii 


21 


22 


23 


24 


25 


Quadratic Forms... 1.0.0.0... 0... e ene ans 295 
21.1 Basic Concepts. Examples .............. 0.0.02 cee eee eee eee 295 
21.2 Representation of Quadratic Forms....................00000. 298 
21.3. Some Applications .......... 0.00. ee eee eee 302 
21:4> Exercises s.:0. ceed va ele oho nade Heda als See 304 
Bounded Linear Operators ............ 0... eee eee 307 
22.1-. JPreliminaries: 2000 oc ieee Ses, che aire oes leet oot ook. atte tts 307 
22:2) - BRAMPIES sats cso de dats edn Mine ee tee we dg Peas 309 
22.3. The Space B(H, K) of Bounded Linear Operators ............. 313 
22.4 The C*-Algebra BCH) ......... 00. ccc eee eens 315 
22.5 Calculus in the C*-Algebra B(H) .... 0.0.2... eee 318 

22-531". “Preliminaries ccs... Gitte badd ¢ wacdin ins Cac eh Spee é 318 

22.5.2 Polar Decomposition of Operators ................... 320 
22:67 JEXCICISES: si is, tsetse dgh.ds ued OFA ees Suse au Seg eh ade wes AOE alah eras 321 
RETETEN CEs 2.238 ce na teedohh Geeta Ae ene ee Mati) deers at Metis ee 323 
Special Classes of Linear Operators ................. 000 e eee ee eee 325 
23.1 Projection Operators ....... 0... ee eee eee 325 
23.2 Unitary Operators ........ 0. eee eee 329 

23:2.) IsOmetries. 2 2:0.-.045 sews Yaa beeen dpa ebew sya: 329 

23.2.2 Unitary Operators and Groups of Unitary Operators .... 330 

23.2.3. Examples of Unitary Operators...................... 333 
23.3. Some Applications of Unitary Operators in Ergodic Theory ..... 333 

23.3.1 Poincaré Recurrence Results.............. 000.00 ee 334 

23.3.2 The Mean Ergodic Theorem of von Neumann ......... 335 
23.4 Self-Adjoint Hamilton Operators .......................00.. 337 

23.4.1 Kato Perturbations .............. 0. ccc cece ees 337 

23.4.2 Kato Perturbations of the Free Hamiltonian ........... 339 
23:5’ -EX€rclS@S’.s.4-.na4 cbs evwad Seat Pewed at a Senedd Moe bw ees 341 
R@feTen COS sec: cherie? thse kinks eels he ee eet hooked ee 342 
Elements of Spectral Theory.................. 0002 e neces 343 
24.1 Basic Concepts and Results ........ 0.0.0... cee eee eee eee 344 
24.2 The Spectrum of Special Operators............ 0.00.00 e ee eee 348 
24.3 Comments on Spectral Properties of Linear Operators .......... 350 
24-4. “EXCrCls@S: (fi:2 cocis Sansa hse pbts tan ties Seated s ade eee Sake 352 
RETETENCE 5c, 2 oeccec Shek od hb ae haiti a heed Berea owed tee de 353 
Compact Operators ........... 0.000... c cece teens 355 
25.1° sBasic TREOLy: 3:3 cs3 Sood on eh See sate ice ha schete ee oa 355 
25,2, “Spectral THEOL. «2.082 wystes Nate pee a bae seule eee ee eie TE eee 359 

25.2.1 The Results of Riesz and Schauder .................. 359 

25.2.2 The Fredholm Alternative ............. 0.0.00. e eae 361 
25.3 EXGNCISES: 2.09 fe eccsceten Ao etd hee ne afb aoe eee 363 


IRELETEMG C2725. 5 sa eet hey Sletenct sean hae aucin ads aso Sk eee et patos eee plane Beate tay Sle as 363 


XVili 


26 


27 


28 


29 


Contents 


Hilbert-Schmidt and Trace Class Operators ..................... 
2021 Basic FRCL. 0.25 25.06 eeog tied, Moos oh eee geet phe dewe ne Sard 
26.2 Dual Spaces of the Spaces of Compact and of Trace Class 
Operators v25cieeee oh cents DESO en peg aN eleeas ERs eee 
26.3 Related Locally Convex Topologies on B(H) ...............4- 
26.4 Partial Trace and Schmidt Decomposition in Separable Hilbert 
SPACeS: sscce dove aviieed gb bad Hee Wh abet eben we pel 
26.4.1. Partial "Trace vse. o5cs. shaieds gots See bee Fe ee 
26.4.2 Schmidt Decomposition .....................00000. 
26.5 Some Applications in Quantum Mechanics ................... 
26:6°  EXerciS@S wiicco eee ba oh pees Sea ede Pew eee eee ee tre 
References:c32:2 sesso etl ig ota eae eee Mia wees aes 


The Spectral Theorem ................ 0.0... cee eee eee 
27.1 Geometric Characterization of Self-Adjointness ............... 
ZEA Preliminaries \.:..52.0 bc. cG diets ePo es iae vob ed 
27.1.2 Subspaces of Controlled Growth .................... 
27.2 Spectral Families and Their Integrals .....................0.. 
27.2.1 Spectral Families ........ 0.0... ccc e eee eee eee eee 
27.2.2 Integration with Respect to a Spectral Family.......... 
27.3. The Spectral Theorem.............. 0... 0. e eee eee eee eee 
27.4 Some Applications .......... 0.00. eee ee eee eee 
DTD. ERGLCISCS | ise vagy ap bow basic ab eit sea Penh Sig be Bertin 6 oe aE Deere 
References. a. k eats tle tate dlectea chien pees oP RETO Ne od at eeened 


Some Applications of the Spectral Representation................. 
28.1 Functional Calculus ........... 00. ee eee ee 
28.2 Decomposition of the Spectrum—Spectral Subspaces .......... 
28.3 Interpretation of the Spectrum of a Self-Adjoint Hamiltonian .... 
28.4 Probabilistic Description of Commuting Observables .......... 
28.5: SEXCLCISES .rxig viet Pe teres Hands wi ebire hp oe tad wots abo e ed oes 
References sti ay: sa gees ES laa ease Sieh aac Se eae Ge 


Spectral Analysis in Rigged Hilbert Spaces....................--- 
29.1 Rigged Hilbert Spaces ............ 0... 0. eee eee eee 
29.1.1 Motivation for the Use of Generalized Eigenfunctions . . 
29.1.2 Rigged Hilbert Spaces .......................0000.. 
29.1.3 Examples of Nuclear Spaces.....................00. 
29.1.4 Structure of the Natural Embedding in a Gelfand Triple . 
29.2 Spectral Analysis of Self-adjoint Operators and Generalized 
Eipenfunctions. 226: Mats n weed Gee ws Shee eee 
29.2.1 Direct Integral of Hilbert Spaces .................... 
29.2.2 Classical Versions of Spectral Representation.......... 
29.2.3. Generalized Eigenfunctions .....................0.. 
29.2.4 Completeness of Generalized Eigenfunctions.......... 
29:3:. EXCLCISES: 53 tse swash guia Reiandeg Seer said als ee GPRS Sees 
RGLELeN COS oi. be esis ddteodg eee shah de et seh bobed epg Rated a eater deecend 


Contents xix 


30 


31 


Operator Algebras and Positive Mappings ....................... 455 

30.1 Representations of C*-Algebras ......... 0.00. cece eee ee eee 455 

30.1.1 Representations of B(H) ........... 00.0. e eee eee 456 

30.2 On Positive Elements and Positive Functionals ................ 460 

30.2.1 The GNS-Construction ...................0000 0008. 462 

30:3° Normal States: 22c20cses coheed s cate iene eoder ed eaes 465 

30.4 Completely Positive Maps ................0. 0. eee eee ee eee 470 

30.4.1 Positive Elements in My(A) 2.0.0.0... ccc cece 470 

30.4.2 Some Basic Properties of Positive Linear Mappings .... 472 

30.4.3. Completely Positive Maps Between C*-Algebras....... 473 
30.4.4 Stinespring Factorization Theorem for Completely 

Positive Maps oy S026 s.sGatsdwevage Heted been 475 

30.4.5 Completely Positive Mappings on B(H) .............. 479 

30:5: .EXGLCIS€S sise pet ces elo hs ye a ite ce ei thie 8 482 

|S) Co) ot (cS eae 482 

Positive Mappings in Quantum Physics ......................0005 483 

31.1 Gleason’s Theorem .......... 0.00. eee ee eee eee 483 

31.2 Kraus Form of Quantum Operations .....................004. 486 

31.2.1 Operations and Effects....................0. 22 487 

31.2.2 The Representation Theorem for Operations........... 490 


31.3. Choi’s Results for Finite Dimensional Completely Positive Maps 493 
31.4 Open Quantum Systems, Reduced Dynamics and Decoherence .. 496 
SD. “BXCLCISES 250 2.50% Sham saath Gace gwuis eels PE RREGE Oe SERS eRe | 498 
RGLErence’s:n%2:-2oey weak ieee haa ee wey badges iene area 499 


Part III Variational Methods 


32 


33 


34 


Introduction e265 oss cece eet sive Wea 8 Hd aw spendin ate RC ag ee as Gace 503 
32.1 Roads to Calculus of Variations .......... 20.0... cee eee eee ee 504 
32.2 Classical Approach Versus Direct Methods ................... 505 
32.3. The Objectives of the Following Chapters .................... 508 
ReferenceSrec iin s ee eben, bf ES ha Mets hohe eed OS 508 
Direct Methods in the Calculus of Variations ..................... 511 
33.1 General Existence Results ........ 0.0... eee eee eee eee 511 
33.2 Minimization in Banach Spaces .................0 0.0002 513 
33.3. Minimization of Special Classes of Functionals ............... 515 
33.4~  EXGrCiseS | yssc3 2d tools hat aa ealane he ae eae ie ae Mee ees 516 
RETETENCES 6 cafe ie'e etans hs aa ida ee tne be oes HOS Bence ie Taw tein 517 
Differential Calculus on Banach Spaces and Extrema of Functions.. 519 
34.1 The Fréchet Derivative ........... 0... c ccc eens 520 
34.2 Extrema of Differentiable Functions.....................005. 526 


34.3. Convexity and Monotonicity ................. 0... eee eee eee 528 


XX Contents 


34.4 Gateaux Derivatives and Variations ..............0 0c eee eee 530 
6 Fn am) Co) 161 1 ne 534 
Reference xs 2ic ny eatin a dee eta eae este ek Ae Oe cc owe eS 535 
35 Constrained Minimization Problems (Method of Lagrange 
Multipliers) o.2iiei a scncieiuntast sunt blak the eddidie oh aed onerdacaie a bres 537 
35.1 Geometrical Interpretation of Constrained Minimization ........ 538 
35.2 Tangent Spaces of Level Surfaces ....................000000. 539 
35.3. Existence of Lagrange Multipliers .......................00.. 541 
35.3.1 Comments on Dido’s Problem ...................... 543 
Fae) Co) 1 se 545 
References: 32s. ahs. gee ada ae ate Ses oe Cae ee Meee EES 546 
36 Boundary and Eigenvalue Problems ..........................05- 547 
36.1 Minimization in Hilbert Spaces.................00..0000000. 547 
36.2 The Dirichlet-Laplace Operator and Other Elliptic Differential 
Operators. ¢ ioc st eateil se ese healt Ba ea yaa even Tere A 551 
36.3 Nonlinear Convex Problems ...................0 000.0 e ee eee 554 
BO:4. - EXGrCiSeS sc tecchoas eis nee aig outa eal giene bar eee eg ba eres 560 
|S) od | (hs 562 
37 Density Functional Theory 
of Atoms and Molecules ............. 0... cece eee eens 563 
SAL” Mntroductyon: isc. aie ace due Ga sas slay Be 6 walk hse HN Nana ee ee BS 563 
37.2 Semiclassical Theories of Density Functionals ................ 565 
37.3. Hohenberg—Kohn Theory ............... 0.0.00. eee ee ee eee ee 566 
37.3.1 Hohenberg—Kohn Variational Principle ............... 570 
37.3.2 The Kohn—Sham Equations .....................00.. 571 
BTA SEXELCISES: a's ease elena garenBare daly Bed heouy Saw awa Gale w ale Be a aees 572 
Reference: 3 2. sae ated a ete Hee et ea ee eae 573 
Appendix A Completion of Metric Spaces ..................00 eee ee eee 575 
Appendix B Metrizable Locally Convex Topological Vector Spaces ...... 579 
Appendix C The Theorem of Baire ............ 00... cece ee eee 581 
Appendix D Bilinear Functionals ........... 00... cece eee eee ee 589 


Notation 


Rt 


K" 


AM 


C(Q2) = C(2; 1K) 


Cr(2) 


Co(2) 


the natural numbers 

NU {0} 

field of real numbers 

field of complex numbers 

field of real or of complex numbers 
the set of nonnegative real numbers 


K vector space of n-tuples of numbers in K 


{ce V;c=atb;aeA;be B} for subsets A and B of a 
vector space V 


{X-u; 4.€ A, u € M} for asubset A C K and a subset M of a 
vector space V over K 


the set of all points in a set A which do not belong to the subset 
BofA 


real part of z € C 
imaginary part of z ¢ C 


vector space of all continuous functions f : 2 — K, for an 
open set 2 C K" 


all bounded functions f € C(S2) equipped with sup-norm 


vector space of all continuous functions f : 2 — K with 
compact support in {2 


D“* = ale 


a aan 
ax, T.gxan 


C*(Q) 


Ci (2) 


Ch(Q) 


supp f 
Dx(Q) 


PK im 


Kym 


D(2) 


Ce (R") 


|x| 


S(2) 


Pm,k 


E(2) 


Notation 


vector space of all functions which have continuous derivatives 
up to order k, fork = 0,1,2,... 


derivative monomial of order |a| = a; +---+«a@,, defined on 
spaces C*(Q), for open sets 2 C R” and k > |a| 


Lf@)-fO)I 


space of all f € C,(92) for which Qo(f) = sup Ix=yle 
xX, yEQ xAy 


oo for 0 < a < 1, Hélder space of exponent a 


<= 


space of all f € C*(Q) for which D* f € C,(Q) for all |B| < k, 
for k € No with norm II F llc = maxygi<z SUP, c- |D? f(x)| 


fork € No and 0 <a < 1, space of all f € CK(Q) for which 
II Fllct« = IF lee + max Qu(D? f) < Co 


support of the function f 


vector space of all functions f : £2 — K which have continuous 
derivatives of any order and which have a compact support supp f 
contained in the compact subset K of £2 Cc R"”, equipped with 
the topology of uniform convergence of all derivatives 


form = 0,1,2,..., K C &, K compact, 2 C R” open, the 
semi-norm on Dx (S2) defined by pxyn(f) = sup |D* f(x)| 


la|<m,xeK 


the semi-norm on Dx(S2) defined by gqxwm(f) — 
1/2 

Dance fe: |D* f(x)Pdx) K, m, & as above 

inductive limit of the spaces Dx({2) with respect to all subsets 


K C 82, K compact; test function space of all C°-functions f : 
§2 — K which have a compact support in the open set 2 C R” 


the vector space of all compactly supported C® functions on R” 


Euclidean norm , en feeet x2 of the vector x = (X1,...,Xn) € 


R"” 


test function space of all C®-functions f : $2 — K which, to- 
gether with all their derivatives decrease faster than const. (1 + 
|x|)~* fork = 0,1,2,..., for some constant and x € 92 


the norm on S(R") defined by pnax(f) = sup 


xER", |a|<k 


d+ 
x?) |D* f (x)| for m,k = 0,1,2,... 


test function space of all C*-functions f : — K, equipped 
with the topology of uniform convergence of all derivatives f% = 
D™ f on all compact subsets K of 2 


Notation 


Ictvs 

hictvs 

x* 

x! 
D'(2)=D(2y 


S'(2) = S(R2Y 


EQ) = E(QY 


Tf 


Dreg(@) 
D'+(R) 


L’(22) 


II Fllp 
L™(&2) 


EA) 


By (xo) 


XXili 
locally convex topological vector space 
Hausdorff locally convex topological vector space 
algebraic dual of a vector space X 
topological dual of a topological vector space X 
space of all distributions on the open set 2 C R” 


space of all tempered (i.e., slowly growing) distributions on 
Q Cc R"” 


space of all distributions on §2 C R” with compact support 


the regular distribution defined by the locally integrable function 


P 


the space of all regular distributions on the open set 2 C R” 
space of all distributions on R with support in R* 


space of equivalence classes of Lebesgue measurable functions 
on §2 C R" for which | f|? is Lebesgue integrable over 2; 
1 < p < w, 2 Lebesgue measurable 


norm of L?(S2) defined by || f||> = Te | f(x)|?dx 


space of all equivalence classes of Lebesgue measurable func- 
tions on §2 which are essentially bounded; £2 C R” Lebesgue 
measurable 


all f as for L?(22), but | f|? only integrable over every compact 
set K in 2, with system of semi-norms || f||k = Ilxx fllp> 
K C 82 compact, xx = characteristic function of K 


open ball of radius r > 0 and centre xo, with respect to the 
semi-norm p 


Dirac’s delta distribution centered at x = a € R"; fora = Owe 
write 6 instead of do 


Heaviside function 
Cauchy’s principal value 


lima.o =; in D’(R) 


XXIV 

supp T 
supp sing T 
£8 
T@S 


D(R") ® DR") 


D(R") Qn D(R”) 


D(R")®zD(R") 
uUxvV 


T *u 


TxS 


F' 


ClkPI(2) 


WkP(Q) 


WkP(Q) 


Wie? (Q) 


H*(Q) 


Notation 


support of a distribution T 

singular support of a distribution T 

tensor product of two functions f and g 
tensor product of two distributions T and S 


algebraic tensor product of the test function spaces D(R”) and 
D(R") 


the space D(R”) ® D(R”) equipped with the projective tensor 
product topology 


completion of the space D(R") ®, D(R”) 
convolution of two functions u and v 


the convolution of a distribution T € D’(S2) with a test function 
u € D(S2); regularization of T 


convolution of two distributions T and S, if defined 
the differential operator $(4 + if) on D’(R?) 
operator of Fourier transform, on L!(R") or S(R") 


unitary operator of Fourier transform on the Hilbert space 
L?(R") 


Fourier transform on S’(IR") 


vector space of all functions f € C*(2) which have derivatives 
D* f € L?(&)uptoorderk, fork = 0,1,2,... andl < p< oo 


Sobolev space of order (k, p), fork = 0,1,2,... andl < 
p < 00, completion of C!*-?!(2) with respect to the norm ||: ||x,p 
defined by luke, = Vies<k || De ull? rays or 


according to Meyers—Serrin: the space of all wu € L?(§2) which 
have weak derivatives D%u in L?(Q2) for |a| < k 


as W©P(92) with L?(2) replaced by L? (22) 


loc 


the Hilbert space W*?(Q) with inner product (f, g) HKQ) = 
ines (BP? FD" 2) i200) 


Notation 


ONB 

dimV 

D(A) 

kerA = N(A) 
ran A 

I'(A) 

A* 


Ar 


A+B 


L(x, Y) 


BH) = BH, H) 
A =(D,A) 
BACH) 

By(H) 


Sobolev conjugate exponent of p defined by p* = —" for 


n=p 
1 < p<nand p* =o forp=n 


inner product on a vector space 

norm on a vector space 

Hilbert space of square summable sequences of numbers in K 
orthogonal complement of a set M in a Hilbert space 

the linear span of the set M in a vector space 


the closure of lin M in a topological vector space, i.e., the 
smallest closed subspace which contains M 


orthonormal system in a Hilbert space 
orthonormal basis in a (separable) Hilbert space 
dimension of a vector space V 

domain (of definition) of the (linear) operator A 
the kernel or null-space of a linear operator A 

the range or set of values of a linear operator A 
graph of a linear operator A 

the adjoint of the densely defined linear operator A 


Friedrichs extension of the densely defined nonnegative linear 
operator A 


form sum of the linear operators A and B 


space of continuous linear operators X — Y, X and Y topolog- 
ical vector spaces over the field K 


space of bounded linear operators on a Hilbert space H. 
linear operator with domain D and rule of assignment A 
space of compact operators on a Hilbert space H 


the space of all trace class operators on H. 


U(H) 
p(A) 


Raz) 


o(A) 
op(A) 
o,(A) 
oq(A) 
Oac(A) 
Osc(A) 
H(A) 
H(A) 
Hsc(A) 


HaclA) 


H,(A) 


M,(A) 


Notation 


the space of all Hilbert-Schmidt operators on H. 


the finite rank operator on a Hilbert space H defined by [e, f]x = 
(f,x)e for x € H, for any given (unit) vectors e, f € H 


space of all unitary operators on a Hilbert space H. 
resolvent set of a linear operator A 


resolvent operator at the point z € p(A) for the linear operator 
A 


= C\ (A), spectrum of the linear operator A 

point spectrum of A 

= 0(A)\o,(A), continuous spectrum of A 

discrete spectrum of A 

absolutely continuous spectrum of A 

singular continuous spectrum of A 

discontinuous subspace of A 

continuous subspace of A 

singular continuous subspace of a self-adjoint operator A 


= H.(A) 0 H,-(A)*, absolute continuous subspace of a self- 
adjoint operator A 


= H,(A)@H,-(A), singular subspace of a self-adjoint operator 
A 


subspace of bounded states of a self-adjoint Schrédinger opera- 
tor H 


subspace of scattering states of H, H as above 


orthogonal projection operator onto the closed subspace M of a 
Hilbert space 


for a function f : M—R andr é€ R the sub-level set 
{xeM: fx) <r} 


projection onto the closed convex subset K of a Hilbert space H 


Notation 


f'@) = Dy f = Df (x) 


BUE*", F) 


df (xo, h) 


8x9 fh) 


A" f (x0, h) 


T.M 


XXVii 


for a function f : M—R andc é€ R the level set 
{x eM: f(x) =c} 


the Fréchet derivative of a function f : U > F ata point 
x € U,for U C E open, E, F Banach spaces 


the Banach space of all continuous n-linear operators 
E*" = E x---x E- F, for Banach spaces E, F 


Gateaux differential of a function f : UF at a point 
xo € U in the direction hh € E, U C E open, E,F 
Banach spaces 


Gateaux derivative of f at x9 €¢ U, applied toh € E 


= £ f (xo + th)|,;<o9, nth variation of a function f at the 


point xo in the direction h 


tangent space of the differential manifold M at the point 
xeM 


Part I 
Distributions 


Chapter 1 
Introduction 


One of the earliest and most famous examples of a generalized function or distribution 
is “Dirac’s delta function.” It was originally defined by Dirac (1926-1927) as a 
function 


R > x —> 6,,(x) € R= RU {oo} 


with the following properties (xo is a given real number): 


(a) 


0: xER, x 4X0, 


+00: xX =X. 


8xo (x) = 


(b) ts Ff (x)6x(x)dx = f (xo) for all sufficiently smooth functions f : R—> R. 


However, elementary results from integration theory show that the conditions (a) and 
(b) contradict each other. Indeed, by (a), f(x)d,,(x) = 0 for almost all x € R (with 
to the Lebesgue measure on R), and thus the Lebesgue integral f(x)5x)(x) vanishes: 


/ F(X) bx9(x)dx = 0 
R 


and this contradicts (b) for all f with f(xo) € 0. An appropriate reading of condition 
(b) is to interpret f(x)d,,(x)dx as a measure of total mass 1 which is concentrated 
in x = xo. But this is in conflict with condition (a). 

Nevertheless, physicists continued to work with this contradictory object quite 
successfully, in the sense of formal calculations. This showed that this mathematical 
object was useful in principle. In addition numerous other examples hinted at the 
usefulness of mathematical objects similar to Dirac’s distribution. These objects, 
respectively concepts, were introduced initially in an often rather vague way in order 
to deal with concrete problems. The concepts we have in mind here were mainly those 
which later in the theory of generalized functions found their natural formulation 
as weak derivative, generalized solution, Green’s function etc. This is to say that 
distribution theory should be considered as the natural result, through a process of 


© Springer International Publishing Switzerland 2015 3 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_1 


4 1 Introduction 


synthesis and simplification, of several attempts to extend classical analysis which 
arose from various concrete problems. With the formulation of distribution theory 
one had an analogous situation to the invention of differential and integral calculus 
by Leibniz and Newton. In both cases many, mainly ad-hoc methods, were known 
for the solutions of many concrete problems which then found their “synthesis and 
simplification” in a comprehensive theory. 

The main contributions to the development of distribution theory came from S. 
Bochner, J. Leray, K. Friedrichs, S. Sobolev, I. M. Gelfand, and, in particular, Lau- 
rent Schwartz (1945-1949). New general ideas and methods from topology and 
functional analysis were used, mainly by L. Schwartz, in order to solve many, often 
old, problems and to extract their common general mathematical framework. Dis- 
tribution theory, as created through this process, allows us to consider well-defined 
mathematical objects with the conditions (a) and (b) from above by giving these 
conditions a new interpretation. In a first step, condition (b) becomes the definition 
of an object 6,, which generalizes the concept of the Lebesgue integral in the original 
formulation, i.e., our preliminary definition for 5, reads (F denotes the vector space 
of all functions f : R > C): 


bx :{f € F; f sufficiently smooth} — C defined by 


dxo(f) = fo). 


According to this 5,;, assigns numbers to sufficiently smooth functions f in a linear 
way, just as ordinary integrals 


L(f) = [ sorenas 


if they do exist (here g is a given function). Property (a) then becomes a “sup- 
port property” of this newly defined object on a vector space of sufficiently smooth 
functions: 

dxo(f) =O whenever f(xo) = 0. 


In this sense one can also consider functions as “linear functions” or “functionals” on 
a suitable vector space of functions @. The idea is quite simple: Consider the vector 
space Co(R”) of continuous functions ¢@ : R” — C with compact support supp ¢. 
Recall: The support of a function is by definition the closure of the set of those points 
where the function does not vanish, i.e., 


supp @ = {x € R": P(x) F O}. 


Then every continuous function g on R” can be considered, in a natural way, as a 
linear functional J, on the vector space Co(IR”) by defining 


1,() = [ _ g(x)o(a)de. (1.1) 


1 Introduction 5 


When we think about the fact that the values of measurements of physical quantities 
are obtained by an averaging process, then the interpretation appears reasonable that 
many physical quantities can be described mathematically only by objects of the 
type (1.1). Later, when we have progressed with the precise formulation, we will 
call objects of the type (1.1) regular distributions. Distributions are a special class of 
generalized functions which indeed generalize functions along the lines indicated in 
(1.1). This will be discussed in more detail later. The theory of generalized functions 
has been developed to overcome various difficulties in classical analysis, in particular 
the following problems: 


(i) the existence of continuous but not differentiable functions (B. Riemann 1861, 


K. Weierstra8 1872), e.g., f(x) = 7%) BS; 


(ii) the problem of interchangeability of limit operations. 


A brief illustration of the kind of problems we have in mind in (ii) is the existence of 
sequences of C°-functions f,, which converge uniformly to a limit function which 
is of class C® too, but the sequence of derivatives does not converge (in the sense 
of classical analysis). A simple example is the sequence f(x) = 1 sinnx which 
converges to 0 uniformly on R, but the sequence of derivatives f(x) = cos nx does 
not converge, not even point-wise. 

Our focus will be the distribution theory as developed mainly by L. Schwartz. 
The Sect. 12 discusses some other important classes of generalized functions. 

Distribution theory addresses the problem of generalizing the classical concept 
of a function in such a way that the difficulties related to this classical concept are 
resolved in the new theory. In concrete terms, this envisaged generalization of the 
classical concept of functions should satisfy the following four conditions: 


. Every (locally integrable) function is a distribution. 

. Every distribution is differentiable, and the derivative is again a distribution. 

. As far as possible, the rules of calculation of classical analysis remain valid. 

. In distribution theory the interchangeability of the main limit operations is 
guaranteed “automatically.” 


BRwWN Re 


As mentioned above, the realization of this program leads to a synthesis and a sim- 
plification. Nevertheless, we do not get mathematically well-defined objects with the 
very convenient properties (1), (2), (3), (4) for free. The mathematical work has to 
be done at the level of definition of these objects. At this point distribution theory 
might appear to be difficult. However, in reality it is quite simple, and for practical 
applications only a rather limited amount of mathematical knowledge is required. 

There are different ways to define distributions; we mention the main three. One 
can define distributions as: 


D; continuous linear functions on suitable spaces of smooth functions (“test 
functions’’); 

D2 certain equivalence classes of suitable Cauchy sequences of (smooth) functions; 

D; “weak” derivatives of continuous functions (locally). 


6 1 Introduction 


We consider the first way as the most convenient and most powerful since many results 
from functional analysis can be used directly. Accordingly we define distributions 
according to D, and derive D2 and D3 as important characterizations of distributions. 


Remark 1.1 Many details about the historical development of distribution theory 
can be found in the book by J. Lutzen “The Prehistory of the Theory of Distributions,” 
Springer-Verlag 1982. Here we mention only two important aspects very briefly: It 
was not in order to give the Dirac function a mathematical meaning that L. Schwartz 
was interested in what later became the theory of distributions, but in order to solve 
a relatively abstract problem formulated by Choquet and Deny (1944). But without 
hesitation L. Schwartz addressed also practical problems in his new theory. As early 
as 1946 he gave a talk entitled “Generalization of the concepts of functions and 
derivatives” addressing an audience of electrical engineers. 

For a much broader perspective on this subject as part of the theory of topological 
vector space we recommend the book [1]. 

As we will learn later Schwartz distributions provide a suitable mathematical 
framework for a solution theory of constant coefficient partial differential operators. 
Nevertheless there are many problems in analysis where this framework is too narrow, 
for instance for a solution theory of linear partial differential operators with real 
analytic coefficients. Accordingly various other spaces of generalized functions have 
been introduced and studied. In Chap. 12 we give a very short overview of the most 
prominent spaces of generalized functions. Not all of them are defined via duality. 

Often in problems of (nonlinear) analysis one has to control the growth of a 
function and the growth of a finite number of its derivatives. Thus for such problems 
it is natural to work in a function space where this control is provided by the definition 
of its norm. One of the simplest but widely used class of function spaces is the class 
of Sobolev spaces which we introduce and study in Chap. 13. The last part of this 
book relies on this class of spaces. 


Reference 


1. Bourbaki N. Eléments d’histoire des mathématiques. Espace vectoriels topologiques. Paris: 
Hermann; 1960. 


Chapter 2 
Spaces of Test Functions 


The spaces of test functions we are going to use are vector spaces of smooth (i.e., 
sufficiently often continuously differentiable) functions on open nonempty subsets 
82 © R" equipped with a “natural” topology. Accordingly we start with a general 
method to equip a vector space V with a topology such that the vector space operations 
of addition and scalar multiplication become continuous, i.e., such that 


A:VxV-V, AQ,y)=xt+y, x,yeV, 
M:KxV->V, MQ,x)=Ax, AEK,xeVv 


become continuous functions for this topology. This can be done in several different 
but equivalent ways. The way we describe has the advantage of being the most 
natural one for the spaces of test functions we want to construct. A vector space V 
which is equipped with a topology 7 such that the functions A and M are continuous 
is called a topological vector space, usually abbreviated as TVS. The test function 
spaces used in distribution theory are concrete examples of topological vector spaces 
where, however, the topology has the additional property that every point has a 
neighborhood basis consisting of (absolutely) convex sets. These are called locally 
convex topological vector spaces, abbreviated as LCVTVS. 


2.1 Hausdorff Locally Convex Topological Vector Spaces 


To begin we recall the concept of a topology. To define a topology on a set X means 
to define a system 7 of subsets of X which has the following properties: 


T; X,0€ 7 (@ denotes the empty set); 
T2 We7T,i€1>U;,-,Wi €7 U any index set); 
Tz Wi,...,WyE€T,NEN=S()j_, Wj €T. 


The elements of 7 are called open and their complements closed sets of the 
topological space (X,7). 


© Springer International Publishing Switzerland 2015 7 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_2 


8 2 Spaces of Test Functions 


Example 2.1 


1. Define 7; = {0, X}. J; is called the trivial topology on X. 

2. Define Jz to be the system of all subsets of X including X and %. 7g is called the 
discrete topology on X. 

3. The usual topology on the real line R has as open sets all unions of open intervals 
ja,bl ={x €R:a<x < Dd}. 


Note that according to T3 only finite intersections are allowed. If one would take here 
the intersection of infinitely many sets, the resulting concept of a topology would not 
be very useful. For instance, every point a € R is the intersection of infinitely many 
open intervals J, = Ja — 1, a+ +L, a = Mnen. Hence, if in T3 infinite intersections 
were allowed, all points would be open, thus every subset would be open (see discrete 
topology), a property which in most cases is not very useful. 

If we put any topology on a vector space, it is not assured that the basic vector 
space operations of addition and scalar multiplication will be continuous. A fairly 
concrete method to define a topology 7 on a vector space V so that the resulting 
topological space (V,7) is actually a topological vector space, is described in the 
following paragraphs. The starting point is the concept of a seminorm on a vector 
space as a real valued, subadditive, positive homogeneous and symmetric function. 


Definition 2.1 Let V be a vector space over K. Any function g : V > R with the 
properties 


(i) Vx, ye Vi g(x + y) S g(x) + q(y) (subadditive), 
(ii) VA € K, Vx € V q(x) = |Alq(x), (symmetric and positive homogeneous), 


is called a seminorm on V. If a seminorm gq has the additional property 
@ii) gx) =O0> x =0, 


then it is called a norm. 
There are some immediate consequences which are used very often: 


Lemma 2.1 For every seminorm q on a vector space V one has 


1. q(0) =0; 
2. Wx,y € V: |g(x) -—GdQ)| S g(x — y); 
3. Vx € V:0 < q(x). 


Proof The second condition in the definition of a seminorm gives for A = 0 that 
q(Ox) = 0. But for any x € V one has 0x = O & the neutral element 0 in V 
and the first part follows. Apply subadditivity of g to x = y + (x — y) to get 
q(x) = q(y + (* — y)) S g(y) + g(x — y). Similarly one gets for y = x + (y — x) 
that g(y) < g(x) + 4(y — x). The symmetry condition ii) of a seminorm says in 
particular g(— x) = q(x), hence g(x — y) = q(y — x), and thus the above two 
estimates together say (g(x) — q(y)) < q(x — y) and this proves the second part. 
For y = 0 the second part says |q(x) — q(O)| < g(x), hence by observing g(0) = 0 


2.1 Hausdorff Locally Convex Topological Vector Spaces 9 


we get |g(x)| < q(x) and therefore a seminorm takes only nonnegative values and 
we conclude. 


Example 2.2 


1. It is easy to show that the functions gq; : R’ — R defined by g(x) = |x;| for 
X = (x1,...,X,) € R” are seminorms on the real vector space R” but not norms 
ifn > 1. And it is well known that the system P = {q),... ,g,} can be used to 
define the usual Euclidean topology on R”. 

2. More generally, consider any vector space V over the field K and its algebraic 
dual space V* = L(V; KK) defined as the set of all linear functions T : V > K, 
i.e. those functions which satisfy 


T(ax + By) =aT(x)+ BT) Vx,yeEV, Va,p € K. 
Each such T € V* defines a seminorm gr on V by 
qr(x) = |T(x)| VxeV. 


3. For an open nonempty set 2 C R", the set C*(Q) of all functions f : 2 > 
KK which have continuous derivatives up to order k is actually a vector space 
over K and on it the following functions px», and qx m are indeed seminorms. 
Here K C £2 is any compact subset and k € N is any nonnegative integer. For 
0<m<kand@¢ € C(2) define 


Pam) = sup |D*¢(x)|, (2.1) 

xE€K,|a|<m 

1/2 

axm@) = | >> i |D“b(x)Pdx ]} (2.2) 

la|<m* & 
The notation is as follows. For a multi-index a = (a),... ,a@,) € N” we denote 
by D® = i the derivative monomial of order |a| = a, +--+ + Qn, ie€., 

rel nxe 


D* $(x) = sar tar (x), xX = (X1,...,X,). Thus, for example for f € C3(R3), one 


a io] 
xp bean” 


has in this notation: If a = (1,0, 0), then |a| = 1 and D® f = ae; ifa = (1, 1,0), 
then |a| = 2 and D¢ f = ~°“; if w = (0,0,2), then |a| = 2 and D* f = 2f 


Ox10Xx2? a2x3 > 
‘ _ = _ arf 
ifaw = (1,1, 1) then |a| = 3 and D* f = sa. 
A few comments on these examples are in order. The seminorms given in the second 
example play an important role in general functional analysis, those of the third will 
be used later in the definition of the topology on the test function spaces used in 
distribution theory. 


10 2 Spaces of Test Functions 


Recall that in a Euclidean space R” the open ball B,(x) with radius r > 0 and 
centre x is defined by 


B(x) ={y eR": ly—x| <r} 


where |y — x| = /}°"_, (y; — x;)* is the Euclidean distance between the points 
y = (j,--- »¥n) and x = (x1,...,X,). Similarly one proceeds in a vector space 


V on which a seminorm p is given: The open p-ball in V with centre x and radius 
r > Ois defined by 


Byr(x) ={y EV: ply-x) <r}. 


In this definition the Euclidean distance is replaced by the semidistance d,(y, x) 
= p(y — x) between the points y, x € V. Note: If p is not a norm, then one can have 
d,(y,x) = 0 for y # x. In this case the open p-ball B,,-(0) contains the nontrivial 
subspace N(p) = {y € V: p(y) = 0}. Nevertheless these p-balls share all essential 
properties with balls in Euclidean space. 


1. By,-(x) = x + Bp,, i.e., every point y € B,,(x) has the unique representation 
y=x+zwithz € By, = B,,(O); 

2. By, is circular, i.e., y € Bp,, a € K, |a| < 1 implies ax € B,,; 

3. By, is convex, i.e., x,y € By, and0 <A < 1 implies Ax +(1—A)y € Bp,; 

4. B,, absorbs the points of V, i.e., for every x € V there is a A > O such that 
Ax € Byys 

5. The nonempty intersection Bp, -;(x1) 1 Bp ,(x2) of two open p-balls contains 
an open p-ball: By -(x) C Bp, (41) ON By yy (2). 


For the proof of these statements see the Exercises. 

In a finite dimensional vector space all norms are equivalent, i.e., they define the 
same topology. However, this statement does not hold in an infinite dimensional 
vector space (see Exercises). As the above examples indicate, in an infinite dimen- 
sional vector space there are many different seminorms. This raises naturally two 
questions: How do we compare seminorms? When do two systems of seminorms 
define the same topology? A natural way to compare two seminorms is to compare 
their values in all points. Accordingly one has: 


Definition 2.2 For two seminorms p and g on a vector space V one says 


a) p is smaller than g, in symbols p < q if, and only if, p(x) < g(x) Vx € V; 
b) p and q are comparable if, and only if, either p < q org < p. 

The seminorms gq; in our first example above are not comparable. Among the 
seminorms qx» and px. from the third example there are many which are compa- 


rable. Suppose two compact subsets K; and K> satisfy K, C K2 and the nonnegative 
integers m, is smaller than or equal to the nonnegative integer m2, then obviously 


PKi,m, S PKym) and Kym, S YKo,.m- 


In the Exercises we show the following simple facts about seminorms: If p is 
a seminorm on a vector space V and r a positive real number, then rp defined by 


2.1 Hausdorff Locally Convex Topological Vector Spaces 11 


(rp)(x) = rp(x) for all x € V is again a seminorm on V. The maximum p = 


max {p1,... , Pn} of finitely many seminorms pj,... , P, on V, which is defined by 
p(x) = max {pi (x),... , Pn(x)} for all x € V, is aseminorm on V such that p; < p 
fori = 1,...,n. This prepares us for a discussion of systems of seminorms on a 


vector space. 


Definition 2.3 A system P of seminorms on a vector space V is called filtering if, 
and only if, for any two seminorms pj), p2 € P there is aseminorm qg € P and there 
are positive numbers r;, r2 € Rt such that r; p; < g and rzp2 < g hold. 

Certainly, not all systems of seminorms are filtering (see our first finite- 
dimensional example). However it is straightforward to construct a filtering system 
which contains a given system: Given a system Po on a vector space V one defines 
the system P = P(Po) generated by Po as follows: 


GeEPSIapi---.Pn € Po aris... stn € Rt: gq = max {ri pi,... .fnPn}- 


One can show that P(P ) is the minimal filtering system of seminorms on V that 
contains Po. In our third example above we considered the following two systems 
of seminorms on V = C*(2): 


Pi(2) = {PKm : K C 2, K compact, 0O<m< k}, 
O(2) = {xm : K C 2, K compact, O<m < k}. 


In the Exercises it is shown that both are filtering. 
Our first use of the open p-balls is to define a topology. 


Theorem 2.1 Suppose that P is a filtering system of seminorms on a vector space 
V. Define a system Tp of subsets of V as follows: A subset U C V belongs to Tp, if 
and only if, either U = % or 


Vx €UApeP,Ar>0: By-(x) CU. 


Then Tp is a topology on V in which every point x € V has a neighborhood basis 
V, consisting of open p-balls, V, = {Bpr(x) >peP,r> Oo}. 


Proof Suppose we are given U; € Tp, i € I. We are going to show that U = 
UierU; € Tp. Take any x € U, then x € U; for somei € J. Thus U; € Tp implies: 
There are p € P andr > O such that B,,(x) C U;. It follows that B,,(x) C U, 
hence U € Tp. Next assume that U;,... ,U, € Tp are given. Denote U = M_, Uj 
and consider x € U C Uj, i = 1,...,n. Therefore, fori = 1,...,n, there are 
pi € P andr; > O such that B,, ,,(x) C U;. Since the system P is filtering, there 
is a p € P and there are p; > O such that p;p; < p fori = 1,...,n. Define 
r=min{pj/,... , nn}. It follows that B,,(x) C By, ,,(x) fori = 1,...,n and 
therefore B,,-(x) C M7_,U; = U. Hence the system 7p satisfies the three axioms 
of a topology. By definition 7p is the topology defined by the system V, of open 
p-balls as a neighborhood basis of a point x € V. 

This result shows that there is a unique way to construct a topology on a vector 
space as soon as one is given a filtering system of seminorms. Suppose now that two 


12 2 Spaces of Test Functions 


filtering systems P and Q of seminorms are given on a vector space V. Then we get 
two topologies 7p and Tg on V and naturally one would like to know how these 
topologies compare, in particular when they are equal. This question is answered in 
the following proposition. 


Proposition 2.1 Given two filtering systems P and Q ona vector space V, construct 
the topologies Tp and Tg on V according to Theorem (2.1). Then the following two 
statements are equivalent: 


(i) Tp = Ta. 
(ii) Vp € PAGE DAA>O: p<AqandVqgeVApEePIAA>O0:gqg <dAp. 


Two systems P and Q of seminorms on a vector space V are called equivalent, if, 
and only if, any of these equivalent conditions holds. 

The main technical element of the proof of this proposition is the following ele- 
mentary but widely used lemma about the relation of open p-balls and their defining 
seminorms. Its proof is left as an exercise. 


Lemma 2.2 Suppose that p and q are two seminorms on a vector space V. Then, 
for any r > Oand R > O, the following holds: 


p< 4 € foranyx eV: Byp(x) © By,(x). (2.3) 


Proof (Proof of 2.1) Assume condition (i). Then every open p-ball B,,-(x) is open 
for the topology To, hence there is an open q-ball By.r(x) C By,(x). By the lemma 
we conclude that p < %q. Condition (i) also implies that every open q-ball is open 
for the topology 7p, hence we deduce p < Aq for some 0 < i. Therefore condition 
(ii) holds. 

Conversely, suppose that condition (ii) holds. Then, using again the lemma one 
deduces: For every open p-ball B,,(x) there is an open g-ball By r(x) C Bp,(x) 
and for every open q-ball B, r(x) there is an open p-ball B, (x) C Bg,r(x). This 
then implies that the two topologies Tp and Tg coincide. 

Recall that a topological space is called Hausdorff if any two distinct points can 
be separated by disjoint neighborhoods. There is a convenient way to decide when 
the topology 7p defined by a filtering system of seminorms is Hausdorff. 


Proposition 2.2 Suppose P is a filtering system of seminorms on a vector space 
V. Then the topology Tp is Hausdorff if, and only if, for every x € V, x # 0, there 
is a seminorm p € P such that p(x) > 0. 


Proof Suppose that the topological space (V, 7p) is Hausdorff and x € V is given, 
x # 0. Then there are two open balls B,,(0) and By,r(x) which do not intersect. 
By definition of these balls it follows that p(x) > r > O and the condition of 
the proposition holds. Conversely assume that the condition holds and two points 
x,y € V,x —y # O are given. There is a p € P such that 0 < 2r = p(x — y). 
Then the open balls B,,(x) and B,,-(y) do not intersect. (If z € V were a point 
belonging to both balls, then we would have p(z — x) < r and p(z — y) < r and 


2.1 Hausdorff Locally Convex Topological Vector Spaces 13 


therefore 2r = p(x — y) = pax —z+z—y) < p(x—z)+ p@-—y)<rtr=2r, 
a contradiction). Hence the topology 7p is Hausdorff. 

Finally, we discuss the continuity of the basic vector space operations of addition 
and scalar multiplication with respect to the topology 7p defined by a filtering system 
P of seminorms on a vector space V. Recall that a function f : E —> F froma 
topological space EF into a topological space F is continuous at a point x € EF if, and 
only if, the following condition is satisfied: For every neighborhood U of the point 
y = f(x) in F there is a neighborhood V of x in FE such that f(V) C U, and it is 
enough to consider instead of general neighborhoods U and V only elements of a 
neighborhood basis of f(x), respectively x. 


Proposition 2.3 Let P be a filtering system of seminorms on a vector space V. Then 
addition (A) and scalar multiplication (M) of the vector space V are continuous 
with respect to the topology Tp, hence (V, Tp) is a topological vector space. This 
topological vector space is usually denoted by 


(V,P) or V[P]. 


Proof We show that the addition A : V x V — V is continuous at any point (x, y) € 

V x V. Naturally, the product space V x V is equipped with the product topology of 

Tp. Given any open p-ball B,2,(x+y) for somer > 0, then A(Bp,-(x) x By (y)) C 

Byar(x+y) since for all (x’, y’) € B,,(x) x B,,-(y) we have p(A(x’, y')— A(x, y)) = 

P(x’ + y')—@+y)) = p@’—x+y—y') < p@'—x)+ pO’—y) <r+r =2r. 

Continuity of scalar multiplication M is proved in a similar way. 
We summarize our results in the following theorem. 


Theorem 2.2 Let? be afiltering system of seminorms ona vector space V. Equip V 
with the induced topology Tp. Then (V, Tp) = V[Tp] is a locally convex topological 
vector space. It is Hausdorff ora HLCVTVS if, and only if, for every x € V, x #0, 
there isa p € P such that p(x) > 0. 


Proof By Theorem 2.1 every point x € V has a neighbourhood basis V, consisting 
of open p-balls. These balls are absolutely convex (i.e. y,z € Bp,(x), a, B € K, 
a+ B =1, ja|+|B| < 1 implies ay + Bz € B,,(x)) by the properties of p-balls 
listed earlier. Hence by Proposition 2.3 V[T7p] is a LCTVS. Finally by Proposition 
2.2 we conclude. 


2.1.1 Examples of HLCTVS 


The examples of HLCTVS which we are going to discuss serve a dual purpose. 
Naturally they are considered in order to illustrate the concepts and results introduced 
above. Then later they will be used as building blocks of the test function spaces used 
in distribution theory. 


1. Recall the filtering systems of seminorms ?;(S2) and Q;,(S2) introduced earlier 
on the vector space C*() of k times continuously differentiable functions on 


2 Spaces of Test Functions 


an open nonempty subset 2 C R”. With the help of Theorem 2.2 it is easy to 
show that both (C* (2), P;()) and (C*(2), Q;()) are Hausdorff locally convex 
topological vector spaces. 


. Fix a compact subset K of some open nonempty set $2 C R” and consider the 


space C¥({2) of all functions @ : 2 — K which are infinitely often differentiable 
on (2 and which have their support in K, i.e., supp f C K. On C%({2) consider 
the systems of semi-norms 


PHAQ)=lpen tH Us) On) = (gen? WH 012] 


introduced in Eq. (2.1), respectively in Eq. (2.2). Both systems are obviously 
filtering, and both px, and gx m are norms on C¥(S2). In the Exercises it is 
shown that both systems are equivalent and thus we get that 


Dx (2) = (CR), PK(2)) = (CR(2), Qx(2)) (2.4) 


is a Hausdorff locally convex topological vector space. 


. Now let 2 C R" bean open nonempty subset which may be unbounded. Consider 


the vector space C *(Q) of functions @ : @ — K which have continuous deriva- 
tives up to order k. Introduce two families of symmetric and subadditive functions 
C(2) > [0, +00] by defining, for / = 0,1,2,...,k andm =0,1,2,..., 


Pmi@) = SUpyeg, jaar I + x7Y"/7|D7G(x)I, 
Grid) = wale Gtx yp" e@)rdz)*. 
Forx = (x1,... ,%,) € R” weuse the notation x2 = xi+: . +x? and |x| = V x2. 


Define the following subspace of C*(2): 
Ck (2) = {¢ € C2): Pmj(b) < 00, 1=0,1,...,k}. 


Then the system of norms { Pmg 1 O<1< k} is filtering on this subspace and thus 
(CK (2), { Pm :0<l< k}) is a HLCTVS. Ck (@) is the space of continuously 
differentiable functions which decay at infinity (if £2 is unbounded), with all 
derivatives of order < k, at least as |x|~”. Similarly one can build a HLCTVS 
space by using the system of norms g,,;,0 <1 <k. 


. In this example we use some basic facts from Lebesgue integration theory [1]. 


Let 2 C R” be a nonempty measurable set. On the vector space L},.(2) of all 


loc 
measurable functions f : 8 — K which are locally integrable, i.e., for which 


II fll = [ fens 


is finite for every compact subset K C £2, consider the system of seminorms 
P = {ll-llk : K C 2, K compact}. Since the finite union of compact sets is 
compact, it follows easily that this system is filtering. If f € L/,.(S2) is given and 


if f # 0, then there is a compact set K such that || ||, > 0, since f 4 0 means 


2.1 Hausdorff Locally Convex Topological Vector Spaces 15 


that f is different from zero on a set of positive Lebesgue measure. Therefore, 
by Theorem 2.2, the space 


(Lige(Q), {ll-llx + K C 2, K compact} ) 


is a HLCTVS. 


2.1.2 Continuity and Convergence in a HLCVTVS 


Since the topology of a LCTVS V[P] is defined in terms of a filtering system P 
of seminorms it is, in most cases, much more convenient to have a characterization 
of the basic concepts of convergence, of a Cauchy sequence, and of continuity in 
terms of the seminorms directly instead of having to rely on the general topological 
definitions. Such characterizations will be given in this subsection. 

Recall: A sequence (x');cy of points x! = (x/,... ,x/) € R” is said to converge 
if, and only if, there is a point x € R” such that for every open Euclidean ball 
B,(x) = {y € R" : |y —x| <r} only a finite number of elements of the sequence 
are not contained in this ball, i.e., there is an index ip, depending on r > 0, such 
that x’ € B,(x) for alli > i, or expressed directly in terms of the Euclidean norm, 
|xi —x| <r foralli > io. 

Similarly one proceeds in a general HLCTVS V[P] where now however instead 
of the Euclidean norm | - | all the seminorms p € P have to be taken into account. 


Definition 2.4 Let V[P] be a HLCTVS and (x;);en a sequence in V[P’]. Then one 
says: 


1. The sequence (x;);en converges (in V[P]) if, and only if, there is anx € V 
(called a limit point of the sequence) such that for every p € P and for every 
r > Othere is an index ip = ig(p,r) depending on p andr such that p(x —x;) <r 
for all i > ig. 

2. The sequence (x;);cen is a Cauchy sequence if, and only if, for every p € P 
and every r > 0 there is an index ig = io(p,r) such that p(x; — x;) < r for all 
i,j = io. 

The following immediate results are well known in R”. 

Theorem 2.3 


(a) Every convergent sequence ina LCTVS V[P ] is a Cauchy sequence. 
(b) Ina HLCTVS V[P] the limit point of a convergent sequence is unique. 


Proof Suppose a sequence (x;);en converges in V[P] tox € V. Then, for any p €« P 
and any r > 0, there is an ig € N such that p(x —x;) < r/2 for alli > io. Therefore, 
foralli, 7 = ig, one has p(x;—x;) = p(x—xj)+(ji—-x)) < p(x—xj)+p(ji—-x) < 
5 + 7 =r, hence (x;)jen is a Cauchy sequence and part (a) follows. 


16 2 Spaces of Test Functions 


Suppose V[P] is a HLCTVS and (x;);-en is a convergent sequence in V[P]. 
Assume that for x, y € V the condition in the definition of convergence holds, i.e., 
for every p € P andevery r > 0 there is an i; such that p(x — x;) < r for alli > i, 
and there is an iz such that p(y — x;) < r for alli > in. Then, for alli > max {i1, in}, 
P(x — y) = pix — x; +x; — y) S p(x — x}) + PQ — y) <r +r = 2r, and since 
r > Ois arbitrary, it follows that p(x — y) = 0. Since this holds for every p € P 
and V[P] is Hausdorff, we conclude (see Proposition 2.2) that x = y and thus part 
(b) follows. 

Part (a) of Theorem 2.3 raises naturally the question whether the converse holds 
too, i.e. whether every Cauchy sequence converges. In general, this is not the case. 
Spaces in which this statement holds are distinguished according to the following 
definition. 


Definition 2.5 A HLCTVS in which every Cauchy sequence converges is called 
sequentially complete. 


Example 2.3 


1. Per construction, the field R of real numbers equipped with the absolute value | - | 
as a norm is a sequentially complete HLCTVS. 

2. The Euclidean spaces (R”,| - |), n=1,2, ... are HLCTVS. Here | - | denotes the 
Euclidean norm. 

3. For any §2 C R”, 82 open and nonempty, and k=0,1,2,... , the space 


Ci(Q)MPr(2)I 


is a sequentially complete HLCTVS. This is shown in the Exercises. Recall the 
definition 


Pi(2) = {PK :K C2, K compact, 0O<m< k}. 


Note that C*(2)[P;(@)] is equipped with the topology of uniform convergence 
of all derivatives of order < k on all compact subsets of 2. 


Compared to a general topological vector space one has a fairly explicit description of 
the topology in a locally convex topological vector space. Here, as we have learned, 
each point has a neighborhood basis consisting of open balls, and thus formulating 
the definition of continuity one can completely rely on these open balls. This then has 
an immediate translation into conditions involving only the systems of seminorms 
which define the topology. Suppose that X[P] and Y[Q] are two LCTVS. Then a 
function f : X — Y is said to be continuous at x9 € X if, and only if, for every 
open q-ball By r(f(xo)) in Y[Q] there is an open p-ball B,,(x) in X[P] which is 
mapped by f into By r(f(xo)). This can also be expressed as follows: 


Definition 2.6 Assume that X[P] and Y[Q] are two LCTVS. A function 
f : X — Y is said to be continuous at x9 € X if, and only if, for every seminorm 
q € Qand every R > O there are p € P andr > O such that for all x € X the 
condition p(x — x9) < r implies g( f(x) — f(xo)) < R. f is called continuous on 
X if, and only if, f is continuous at every point xp € X. 


2.1 Hausdorff Locally Convex Topological Vector Spaces 17 


Our main interest, however, are linear functions from one locally convex topological 
vector space to another. For them one can give a characterization of continuity which 
in most cases, in particular in concrete examples, is much easier to verify. This 
characterization is prepared by the following definition. 


Definition 2.7 Assume that X[P] and Y[Q] are two LCTVS. A linear function 
f : X — Y is said to be bounded if, and only if, for every seminorm qg € Q there 
are p € P anda > O such that for all x € X one has 


q(f(x)) S Apt). (2.5) 


The announced characterization of continuity now has a simple formulation. 


Theorem 2.4 Let V[P]andY[Q] be twoLCTVS and f : X — Y alinear function. 
Then f is continuous if, and only if, it is bounded. 


Proof Suppose that f is bounded, i.e., given g € Q there are p € P anda > 0 
such that go f < Ap. It follows for any x, y € X: q(fQ) — f(x) =q(f(x—y)) < 
Ap(y — x). Continuity of f at x is now evident: Given g € Qand R > 0, taker = & 
and the seminorm p € P from the boundedness condition. 

Conversely assume that f is continuous. Then f is continuous at 0 € X. Hence, 
given g € Q and R > O there are p € P andr > O such that p(x) < r implies 
q(f(x)) < R (weuse here that f(0) = 0 fora linear function). This shows: B,,-(0) © 
Byof,r(O) and therefore by Lemma 2.2 we conclude that go f < 2p, ie., f is 
bounded. 

The proof of this theorem shows actually some further details about continuity of 
linear functions on LCTVS. We summarize them as a corollary. 


Corollary 2.1 Let X[P] and Y[Q] be two LCTVS and f : X — Y a linear 
function. Then the following statements are equivalent. 


1. f is continuous at the origin x = 0. 

2. f is continuous at some point x € X. 

3. f is continuous. 

4. f is bounded. 

5. f is bounded on some open ball B,,-(0) in X[P]. 


Definition 2.8 The topological dual X'[P] of a Hausdorff toplogical vector space 
X[P] over the field K is by definition the space of all continuous linear functions 
X[P]— K. 

We conclude this subsection with a discussion of an important special case of a 
HLCTVS. Suppose that X[P] is a HLCTVS and that the filtering system of semi- 
norms is countable, i.e.,P = {p; :i € N} with p; < p;+; for alli = 0,1,2,.... 
Then the topology 7p of X[P] can be defined in terms of a metric d, i.e., a function 
d:X x X — R with the following properties: 


1. d(x, y) => Oforall x,y € X; 
2. d(x, y) = d(y,x) forallx,y € X; 


18 2 Spaces of Test Functions 


3. d(x, y) < d(x,z) + d(z, y) for all x, y,z € X; 
4. dx,y)=O0Sx=y. 


In terms of the given system of seminorms, the metric can be expressed as: 


wl pie-y) 


In the Exercises we show that this function is indeed a metric on X which defines 
the given topology by using as open balls with centre x and radius r > 0 the sets 
Bay(x) = ye X: diy,x) <r. AHLCTVS X[P] is called metrizable if, and only 
if, its topology 7p can be defined in terms of a metric. Some other special cases are 
addressed in the Exercises as well. 

We conclude this section with an example of a complete metrizable HLCTVS 
which will play an important role in the definition of the basic test function spaces. 


Proposition 2.4 Let 2 C R" be any nonempty open set and K C 9 any compact 
subset. Then the space Dx (82) introduced in (2.4) is a complete metrizable HLCTVS. 


Proof That this space is metrizable is clear from the definition. The proof of 
completeness is left as an exercise. 


2.2 Basic Test Function Spaces of Distribution Theory 


The previous sections provide nearly all concepts and results which are needed for the 
definition of the standard test function spaces and the study of their basic properties. 
The important items that are missing are the concepts of inductive and projective 
limits of TVS. Here we take a practical approach by defining these concepts not 
abstractly but only in the context where they are used. We discuss now the underlying 
test function spaces of general (Schwartz) distributions, of tempered distributions, 
and of distributions with compact support. 


2.2.1 The Test Function Space D(82) of C°° Functions of Compact 
Support 


For a nonempty open subset 2 C R” recall the spaces Dx (2), K C §2 compact, as 
introduced in Eq. (2.4) and note the following: 


K,C Kk2C 2, Ki, Ky compact > Dg,(2) C Dx, (&). 


The statement “Dx,(92) C Dx,(2)” actually means two things: 


1. The vector space CZ (S2) is a subspace of the vector space Cx (£2). 


2.2 Basic Test Function Spaces of Distribution Theory 19 


2. The restriction of the topology of Dx,({2) to the subspace Dx, (2) equals the 
original topology of Dx,(S2) as defined in Eq. (2.4). 


Now denote by K = K(&2) the set of all compact subsets of §2 and define 


D(Q) = U Del). (2.7) 
Kek 


Then D({2) is the set of functions @ : 2 — K of class C® which have a compact 
support in 92. It is easy to show that this set is actually a vector space over K. In order 
to define a topology on D({2) denote, for K C 92, K compact, by ix : Dx(2) > 
D(&) the identical embedding of Dx (2) into D(2). Define on D(S2) the strongest 
locally convex topology such that all these embeddings ix, K C 2 compact, are 
continuous. Thus D({2) becomes a HLCTVS (see Exercises). In this way the test 
function space D(2) of C*°-functions of compact support is defined as the inductive 
limit of the spaces Dx (2), K C §2 compact. According to this definition a function 
o € C~(2) belongs to D(2) if, and only if, it vanishes in some neighborhood of 
the boundary 02 of 2. 

In the Exercises it is shown that given §2 C R”, 2 open and nonempty, there is a 
sequence of compact sets K;, i € N, with nonempty interior such that 


K; © Kisi CQ VIEN, US Ki = &. 
It follows that, for alli € N, 
Dx(2) € Dx,,,(2) (2.8) 


with the understanding that Dx,({2) is a proper subspace of Dx,,,(S2) and that the 
restriction of the topology of Dx,,,(§2) to Dx,({2) is just the original topology of 
Dx, (£2). 

One deduces that D({2) is actually the strict (because of (2.8)) inductive limit of 
the sequence of complete metrizable spaces Dx, (2), i € N: 


D(Q) = UX, Dx; (2). (2.9) 


We collect some basic properties of the test function space D({2). 


Theorem 2.5 The following statements hold for the test function space D(S2) of 
compactly supported C®-functions on 2 C R", 82 open and not empty: 


1. D(&) is the strict inductive limit of a sequence of complete metrizable Hausdorff 
locally convex topological vector spaces Dx,(&2). 

2. D(S2) is a HLCTVS. 

3. A subset U C D(&2) is a neighborhood of zero if, and only if, UN Dx(&2) is a 
neighborhood of zero in Dx(&2), for every compact subset K C 92. 

4. D(S2) is sequentially complete. 

5. D(&2) is not metrizable. 


20 2 Spaces of Test Functions 


Proof The first statement has been established above. After further preparation the 
remaining statements are shown in the Appendix. 

For many practical purposes it is important to have a concrete description of the 
notion of convergence in D(2). The following characterization results from basic 
properties of inductive limits and is addressed in the Appendix. 


Proposition 2.5 Let 2 C R" be a nonempty open set. Then a sequence (@;)jen 
converges in the test function space D(S2) if, and only if, there is a compact subset 
K C & such that ¢; € Dx(&) for alli € N and this sequence converges in the 
space Dx (£2). 

According to the definition given earlier, a sequence (¢;);en converges in Dx (2) 
to cw) € Dxr(&) > Vr>0 VineN sli Vi>io PKm@ _ gi) <r. 


Proposition 2.6 Let Y[Q] be a locally convex topological vector space and f : 
D(82) — Y[Q] a linear function. Then f is continuous if, and only if, for every 
compact set K C 82 the map f oix : Dx(82) > Y[Q] is continuous. 


Proof By definition the test function space carries the strongest locally convex 
topology such that all the embeddings ix : Dx(92) > D(2), K C §2 compact, are 
continuous. Thus, if f is continuous, all maps f oix are continuous as compositions 
of continuous maps. Conversely assume that all maps f o ix are continuous; then 
given any neighborhood of zero U in Y[Q], we know that (f oix)~'(U) = f-'(U)N 
Dx(2) is a neighborhood of zero in Dx(2). Since this holds for every compact 
subset K it follows, by part 3 of Theorem 2.5, that f~'(U) C D(2) is aneighborhood 
of zero, hence f is continuous. 


2.2.2. The Test Function Space S(S2) of Strongly Decreasing 
C™-Functions on 82 


Again, §2 is an open nonempty subset of R”, often 2 = R”. A function @ € C™(£2) 
is called strongly decreasing if, and only if, it and all its derivatives decrease faster 
than C(1 + x?)~*, for any k € N, ie., if, and only if, the following condition holds: 


Cc 
Voen" VmeNy dc Vreq |D*O(x)| < ———+- (2.10) 
(1+ x?)2 
Certainly, in this estimate the constant C depends in general on the function ¢, the 
order a of the derivative, and the exponent m of decay. Introduce 


So(S2) = {¢ € C~(2): ¢ is strongly decreasing} ; 
It is straightforward to show that So(S2) is a vector space. The norms 


Pmig) = sup (1 +x?)"/?|D%6(x)| 


xEQ, |al<l 


are naturally defined on it for allm,/ = 0, 1,2,.... Equip this space with the topology 
defined by the filtering system P({2) = { Pm :m,l=0,1,2,... } and introduce 


2.2 Basic Test Function Spaces of Distribution Theory 21 


the test function space of strongly decreasing C~ -functions as the Hausdorff locally 
convex topological vector space 


S(2) = (So(2), P(2)). (2.11) 


Note that So(§2) can be expressed in terms of the function spaces Cc (2) introduced 
earlier as: 
SQ) SOF 62: 


Elementary facts about S(S2) are collected in the following theorem. 


Theorem 2.6 The test function space S(Q) of strongly decreasing C~-functions, 
for any open and nonempty subset 2 C R", is a complete metrizable HLCTVS. 


Proof Since the filtering system of norms of this space is countable, S({2) is a 
metrizable HLCTVS. Completeness of this space is shown in the Exercises. Further 
properties will be presented in the Appendix. 


2.2.3 The Test Function Space €(82) of All C°-Functions on 92 


On the vector space C™(S2) we use the filtering system of seminorms P..(@) = 
{PKm: KC 2 compact, m = 0,1,2,...} and then introduce 


E(Q) = (C™(2), Poo(@2)) (2.12) 


as the test function space of all C°-functions with uniform convergence for all 
derivatives on all compact subsets. 

Note that in contrast to elements in S({2) or D(2), elements in E(S2) are not 
restricted in their growth near the boundary of §2. Again we give the basic facts 
about this test function space. 


Theorem 2.7 The test function space E(S2) is a complete metrizable HLCTVS. 


Proof By taking an increasing sequence of compact subsets K; which exhaust 2 
(compare problem 14 of the Exercises) one shows that the topology can be defined in 
terms of a countable set of seminorms; hence this space is metrizable. Completeness 
of the spaces C*(2)[P;(Q)] for all k = 0,1,2,... easily implies completeness of 
E(S2). 


2.2.4 Relation Between the Test Function Spaces D(&2), S(S2), 
and €(S2) 


It is fairly obvious from their definitions that as sets one has 


D(82) C S(2) C E(2). (2.13) 


22 2 Spaces of Test Functions 


The following result shows that this relation also holds for the topological structures 
as well. 


Theorem 2.8 Let 82 C R"” be a nonempty open subset. Then for the three test 
function spaces introduced in the previous subsections the following holds: D(&2) is 
continuously embedded into S(82) and S(&2) is continuously embedded into E(S2). 


Proof Denotei : D(S2) > S(@) and j : S(82) > E(2) the identical embeddings. 
We have to show that both are continuous. According to Proposition 2.6 the embed- 
ding 7 is continuous if, and only if, the embeddings i oix : Dx(2) > S(&2) are 
continuous, for every compact subset K C £2. By Theorem 2.4 it suffices to show 
that these linear maps are bounded. Given any seminorm p,,; € P(S2) we estimate, 
for all ¢ € Dx (2), as follows: 


Pmuicix(@)) = sup (1+x*)"?|D%d(x)| = sup, (1+. x*)"?| D* A(x)I. 
xE xEK 
la| <1 la| <J 


We deduce that, for all 6 € Dx(2), all K C §2 compact, and all m,/ = 0,1,2,..., 


Pmi@ 0 ix(d) < Cpra(d) 


where C = sup,ex (1 + x7)? < oo. Hence the map i o ix is bounded and we 
conclude continuity of the embedding i. 
Similarly we proceed for the embedding j. Take any seminorm px.z € Poo(S2) 


and estimate, for all @ € S(2), 


Pri(i~)) = sup |D%d(x)| < sup, (14+ x7"? |D* G(X), 
xek xE 
lo| <1 la| <1 


i.e. pe Ci()) < Pma(@) for all d € S(Q), for all K C @ compact and all m,1 = 
0,1,2,.... Hence the embedding j is bounded and thus continuous. 


2.3. Exercises 


1. Let p be a seminorm on a vector space V. Show: The null space N(p) = 
{x € V: p(x) = 0} is a linear subspace of V. N(p) is trivial if, and only if, p is 
anorm on V. 

2. Show: If p is a seminorm on a vector space V and r > 0, then rp, defined 


by (rp)(x) = rp(x) for all x € V, is again a seminorm on V. If py,... , Dn 
are seminorms on V, then their maximum p = max {pj,... , Py}, defined by 
p(x) = max {pi (x),... , Pn(x)} for all x € V, is a seminorm such that p; < p 


fori =1,...,n. 


2.3 


1S) 


Nn 


10. 


11. 


Exercises 23 


. Prove the five properties of open p-balls stated in the text. 
. Let p and q be two norms on R”. Show: There are positive numbers r > 0 and 


R > Osuch that rg < p < Rq. Thus ona finite dimensional space all norms are 
equivalent. 


. Prove: The systems of seminorms ?;(2) and Q;() on C k(Q) are filtering. 
. Let P be a filtering system of seminorms on a vector space V. Define the p-balls 


B,,(x) for p € P andr > 0 and the topology 7p as in Theorem 2.1. Show: 
Bp,x(x) € Tp, ie., the balls B,,(x) are open with respect to the topology Tp 
and thus it is consistent to call them open p-balls. 


. Prove Lemma 2.2. 


Hints: Observe that B,.r(x) C By,(x) implies: Whenever z € V satisfies 

q(z) < R, then it follows that p(z) < r. Now fix any y € V and define, 

for anyo > 0,z= ae it follows that g(z) = wre I) < R, hence 

p(z) = oes PCY) < ror p(y) < rACLOD) +o). Since o > 0 is arbitrary, we 

conclude that p(y) < 5q(y) and since this holds for any y € V we conclude 
D\y Rd y 

that p < 7q. The converse direction is straightforward. 


. On the vector space V = K", define the following functions: 


a) q(x) = a ae at = (x1,...,Xn) € KR"; 


b) p(x) = max{|x1|, eae Xn} 

c) r(x) = |x| +--+ [xal- 

Show that these functions are actually norms on K” and all define the same 
topology. 


. Show that the two systems of seminorms P;({2) and Q;(S2) on CP(2) (see 


section “Examples of HLCVTVS”) are equivalent. 

Hints: It is a straightforward estimate to get gx i(¢) < Cxi px () for some 
constant Cx, depending on / and |K| = f x dx. The converse estimate is 
particularly simple for n = 1. There we use for 6 € C¥(2) anda = 
0, 1,2,... the representation (x) = [* 6+ (y)dy to estimate |6(x)| < 
IK|/?( f, Pet P(y)P dy)? and therefore pxi(d) < |K|'?qx 416). The 
general case uses the same idea. 

Using the fact that (R, | - |) is a sequentially complete HLCTVS, show that the 
Euclidean spaces (R”,| - |) are sequentially complete HLCTVS too, for any 
neéN. 

Show that C*(2)[P;(2)] is sequentially complete for 2 C R", 2 open and 
nonempty, k = 0,1,2,.... 

Hints: The underlying ideas of the proof can best be explained for the case 
Q Cc Rand k = 1. Given a Cauchy sequence (f;)jen in C'(2)[P)(@)] and 
any compact set K C 9 and anyr > 0, there is ig € N such that pxi(fi — 
fj) < r for alli, j => ip. Observe, form = 0 and m = | and every x € K: 
A) — ff" @)| < peaCfi— f;). It follows, form € {0,1} andallx € K, that 
fo? )Dien is a Cauchy sequence in K which is known to be complete. Hence 
each of these Cauchy sequences converges to some number which we call f(n)(x), 
Le., fim)(x) = [i eee ul 698 Thus we get two functions f(m) : $2 > K. From 


24 


12. 


13. 


14. 


15. 


16. 


2 Spaces of Test Functions 


the assumed uniform convergence on all compact subsets we deduce that both 
functions are continuous. Apply uniform convergence again to show for any 
x,y € 92 the following chain of identities: fio)(x) — foxy) = limjsoo (fix) — 
fiQv)) = limo is FO dz = ie fay(z)dz. Deduce that fio) is continuously 
differentiable with derivative f(,) and that the given sequence converges to fo) 
in C!(2Q)[Pi(2)I.- 

Using the results of the previous problem show that the spaces Dx (2) defined 
in (2.4) are complete. 

Consider the spaces Dx (§2) and D({2) as introduced in (2.4), respectively (2.7) 
and denote by i, : Dx(§2) — D(2) the identical embedding for K C 2 
compact. Show: There is a strongest locally convex topology TJ on D({2) such 
that all embeddings ix are continuous. This topology is Hausdorff. 

Prove: For any open nonempty subset $2 C R” there is a sequence of compact 
sets K; C © with the following properties: Each set K; has a nonempty interior. 
K; is properly contained in Kj,,. U2, Kj = &. 

Hints: For i € N define 2; = {x € 2: dist(x,d92) > +} and B; = {x € R": 
|x| < i}. Here dist(x, 0S2) denotes the Euclidean distance of the point x € 92 
from the boundary of S2. Then show that the sets K; = B; 1 S2;, fori sufficiently 
large, have the properties as claimed. 

Let 2 C R” be an open nonempty set. Show: For every closed ball K,(x) = 
{fy € R’: |y—x| <r} C Q with centre x € (2 and radius r > O there is a 
¢ € D(2), ¢ A 0, with support supp¢? C K,(x). Thus, in particular, D({2) is 
not empty. 

Hints: Define a function p : R” > R by 


O : for |x| > 1, 
p(x) = = (2.14) 
expj—z : for |x| <1, 


and show that p € C™(R"). Then define ¢,(y) = p(>) and deduce that 
o, € D(&) has the desired support properties. 

Prove: The space S(S2) is complete. 

Hints: One can use the fact that the spaces C(2)[P;(Q)] are complete, for any 
k €N. The decay properties need some additional considerations. 


Reference 


1. 


Grauert H, Fischer W. Differential und Integralrechnung II. Heidelberger Taschenbiicher. Vol. 36. 
Berlin: Springer-Verlag; 1968. 


Chapter 3 
Schwartz Distributions 


As we had mentioned in the introduction, the Schwartz approach to distribution theory 
defines distributions as continuous linear functions on a test function space. The var- 
ious classes of distributions are distinguished by the underlying test function spaces. 
Before we come to the definition of the main classes of Schwartz distribution, we 
collect some basic facts about continuous linear functions or functionals on a Haus- 
dorff locally convex topological vector space (HLCTVS) and about spaces of such 
functionals. Then the definition of the three main spaces of Schwartz distributions is 
straightforward. Numerous examples explain this definition. 

The remainder of this chapter introduces convergence of sequences and series of 
distributions and discusses localization, in particular, support and singular support 
of distributions. 


3.1. The Topological Dual of an HLCTVS 


Suppose that X is a vector space over the field K on which a filtering system P 
of seminorms is given such that X[P] is an HLCTVS. The algebraic dual X* of 
X has been defined as the set of all linear functions or functionals f : X — K. 
The topological dual is defined as the subset of those linear functions which are 
continuous, 1.e., 


X’=xX[P/ = {f exX*:f continuous } (3.1) 


In a natural way, both X* and X’ are vector spaces over K. As a special case of 
Theorem 2.4, the following result is a convenient characterization of the elements of 
the topological dual of a HLCTVS. 


Proposition 3.1 Suppose that X[P]isaHLCTVS and f : X — Kalinear function. 
Then the following statements are equivalent. 


(a) f is continuous, i.e. f € X'. 
(b) Thereisaseminorm p € P andanonnegative number X such that | f (x)| < Ap(x) 
forall x € X. 


© Springer International Publishing Switzerland 2015 25 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_3 


26 3 Schwartz Distributions 
(c) There is a seminorm p € P such that f is bounded on the p-ball B, \(0). 


Proof The equivalence of statements (a) and (b) is just the special case Y[Q] = 
KE{| - |}] of Theorem 2.4. 

The equivalence of (b) and (c) follows easily from Lemma 2.2 if we introduce the 
seminorm q(x) = | f(x)| on X and if we observe that then (b) says g < Ap while (c) 
translates into B, ;(0) © B,(0). 

The geometrical interpretation of linear functionals is often helpful, in particular 
in infinite dimensional spaces. We give a brief review. Recall: A hyperplane through 
the origin is a maximal proper subspace of a vector space X. If such a hyperplane is 
given, there is a point a € X\H such that the vector space X over the field K has 
the representation 


X=H+Ka, 


ie., every point x € X has the unique representation x = h + aa withh € H and 
a € K. The announced geometrical characterization now is 


Proposition 3.2 Let X[P] be a HLCTVS over the field K. 


(a) A linear functional f € X*, f 4 0, is characterized by 
(i) ahyperplane H C X through the origin and 
(ii) the value in a point xo € X\H. 
The connection between the functional f and the hyperplane is given by 


H = ker f={xeX: f(x) = 0}. 


(b) A linear functional f on X is continuous if, and only if, in the geometric 
characterization a) the hyperplane H is closed. 


Proof Given f € X%, the kernel or null space ker f is easily seen to be a linear 
subspace of X. Since f # 0, there is a point in X at which f does not vanish. 
By rescaling this point we get a point a € X\ker f with f(a) = 1. We claim that 
H = ker f is ahyperplane. Given any point x € X observe x = x — f(x)a+ f(x)a 
where h = x— f(x)a € ker f since f(h) = f(x)— f(x) f(a) = Oand f(x)a € Ka. 
The representation x = h+aa withh € ker f anda € K is unique: If one has, for 
some x € X,x =hi+aja =hy+a2a withh; € ker f then h; —h2 = (a; —a2)a 
and thus 0 = f(h; — hz) = (a1 — a2) f(a) = a — ao, hence a; = a2 and h,; = hp. 

Conversely, assume that H is a hyperplane through the origin and a € X\H. 
Then every point x € X has the unique representation x = h+ aa withh € H 
and a € K. Now define fy : X — K by fo(x) = fxo(h + aa) = a. It is an 
elementary calculation to show that f;, is a well defined linear function. Certainly 
one has ker fy = H. This proves part (a). 

In order to prove part (b), we have to show that H = ker ff is closed if, and 
only if, the linear functional f is continuous. When f is continuous then ker f 
is closed as the inverse image of the closed set {0}. Conversely, assume that H = 
ker f is closed. Then its complement X\H is open and there is some open p-ball 
B,,(a) C X\H around the point a, f(a) = 1. In order to prove continuity of f it 


3.2 Definition of Distributions 27 


suffices, according to Proposition 3.1, to show that f is bounded on the open ball 
B,,-(O). This is done indirectly. If there were some x € B,,(O) with | f(x)| > 1 
then y =a — +4 € Bp,(a) and f(y) = f@—- 7a =1-1=0,ie,yeH,a 
contradiction. Therefore, f is bounded on B,,-(0) by | and we conclude. 


3.2. Definition of Distributions 


For an open nonempty subset $2 C R”, we have introduced the test function spaces 
D(Q), S(2), and €(2) as HLCTVSs. Furthermore the relation 


D(Q) C S(Q) C E(2) 


with continuous embeddings in both cases has been established (see Theorem 2.8). 
This section gives the basic definitions of the three basic classes of distributions 
as elements of the topological dual space of these test function spaces. Elements 
of the topological dual D’() of D(Q) are called distributions on 2. Elements of 
the topological dual S’(2) of S(&2) are called tempered distributions and elements 
of topological E'(2) of E(§2) are called distributions of compact support. Later, 
after further preparation, the names for the latter two classes of distributions will be 
apparent. The continuous embeddings mentioned above imply the following relation 
between these three classes of distributions and it justifies calling elements in S’(2), 
respectively in €’(£2), distributions: 


E(2) C S(2) C DQ). (3.2) 


We proceed with a more explicit discussion of distributions. 


Definition 3.1 A distribution T on an open nonempty subset 2 C R” is a contin- 
uous linear functional on the test function space D(S2) of C°°-functions of compact 
support. The set of all distributions on (2 equals the topological dual D’(92) of 
D(S2). 

Another way to define a distribution on a nonempty open subset 2 C R” is to 
recall Proposition 2.6 and to define: A linear functional T on D({2) is a distribution 
on §2 if, and only if, its restriction to the spaces Dx(S2) is continuous for every 
compact subset K C §2. Taking Theorem 2.4 into account one arrives at the following 
characterization of distributions. 


Theorem 3.1 A linear functional T : D(S82) — K is a distribution on the open 
nonempty set 82 C R" if, and only if, for every compact subset K C &2 there exist a 
number C € Rt and a natural number m € N, both depending in general on K and 
T, such that for all 6 € Dx(&2) the estimate 


IT(%)| < Cpx.m) (3.3) 


holds. 


28 3 Schwartz Distributions 


An equivalent way to express this is the following: 


Corollary 3.1 A linear function T : D(@) —> Kis a distribution on 82 if, and only 
if, for every compact subset K C & there is an integer m such that 


Px m(T) = sup {|T(P)|: 6 € De(@), Pem(b) < 1} (3.4) 


is finite and then 


IT@)| S Pem(T)PKmG) Vb € Dx(2). 


The proof of the corollary is left as an exercise. This characterization leads to the 
important concept of the order of a distribution. 


Definition 3.2 Let T be a distribution on 2 C R”, 2 open and nonempty, and let 
K C & be acompact subset. Then the local order O(T, K) of T on K is defined 
as the minimum of all natural numbers m for which (3.3) holds. The order O(T) of 
T is the supremum over all local orders. 

In terms of the concept of order, Theorem 3.1 says: Locally every distribution is 
of finite order, i.e. a finite number of derivatives of the test functions ¢ are used in 
the estimate (3.3) (recall the definition of the seminorms px, in Eq. (2.1)). 


Remark 3.1 


1. As the topological dual of the HLCTVS D(2), the set of all distributions on an 
open set $2 Cc R” forms naturally a vector space over the field IK. Addition and 
scalar multiplication are explicitly given as follows: For all T,7; € D’(@) and 
alla € K, 


VoedDiay (1+ T1h)) = Tih) + Tr), (AT)(o) = ATM). 


Thus, (7,¢@) +> T(¢) is a bilinear function D’ x D > K. 

2. According to their definition, distributions assign real or complex numbers T(¢) 
to a test function @ € D(2). A frequently used alternative notation for the value 
T(¢@) of the function T is 


T(p) = (T,) = (T(x), 6). 


3. In physics textbooks one often finds the notation A eq F(x)@(x)dx for the value 
T (¢) of the distribution T at the test function ¢. This suggestive notation is rather 
formal since when one wants to make sense out of this expression the integral 
sign used has little to do with the standard integrals (further details are provided 
in the section on representation of distributions as “generalized” derivatives of 
continuous functions). 

4. The axiom of choice allows us to show that there are linear functionals on Dx ({2) 
which are not continuous. But nobody has succeeded in giving an explicit example 
of such a noncontinuous functional. Thus, in practice one does not encounter these 
exceptional functionals. 


3.2 Definition of Distributions 29 


5. One may wonder why we spoke about D(S2) as the test function space of distri- 
bution theory. Naturally, D(S2) is not given a priori. One has to make a choice. 
The use of D(S2) is justified 4 posteriori by many successful applications. Nev- 
ertheless, there are some guiding principles for the choice of test function spaces 
(compare the introductory remarks on the goals of distribution theory). 

a) The choice of test function spaces as subspaces of the space of C°-functions 
on which all derivative monomials D®% act linearly and continuously ensure 
that all distributions will be infinitely often differentiable too. 

Further restrictions on the subspace of C°°-functions as a test function space 

depends on the intended use of the resulting space of generalized functions. 

For instance, the choice of C°-functions on §2 with compact support ensures 

that the resulting distributions on £2 are not restricted in their behavior at the 

boundary of the set (2. Later we will see that the test function space of C- 

functions which are strongly decreasing ensures that the resulting space of 

generalized functions admits the Fourier transformation as an isomorphism, 
which has many important consequences. 


b 


wm 


A number of concrete Examples will help to explain how the above definition oper- 
ates in concrete cases. The first class of examples show furthermore how distributions 
generalize functions so that it is appropriate to speak about distributions as special 
classes of generalized functions. Later, we will give an overview of some other 
classes of generalized functions. 


3.2.1 The Regular Distributions 


Suppose that f : 2 — Kisacontinuous function on the open nonempty set 2 Cc R". 
Then, for every compact subset K C 92 the (Riemann) integral / x lf@)|dx = C is 
known to exist. Hence, for all 6 € Dx (2) one has 


i f(x )pa)dx < | If(@)b@)|dx < sup 6.0 | | f (x)|dx. 
K K xeK K 


It follows that Jy : D(Q) — K is well defined by 


p,0) = i f(x)ox)dx = VP EDL) 
and that for all 6 € Dg(2) one has the estimate 


\(l¢.)| < Cpxo(@). 


Elementary properties of the Riemann integral imply that J; is a linear functional on 
D(&). Since we could establish the estimate (3.3) in Theorem 3.1, it follows that I, 
is continuous and thus a distribution on (2. In addition this estimate shows that the 
local order and the order of the distribution J; is 0. 


30 3 Schwartz Distributions 


Obviously these considerations apply to any f € C(S2). Therefore, f t> I+ 
defines a map I : C(§2) — D’(&2) which is easily seen to be linear. In the Exercises 
it is shown that J is injective and thus provides an embedding of the space of all 
continuous functions into the space of distributions. 

Note that the decisive property we used for the embedding of continuous functions 
into the space of distributions was that, for f © C(S2) and every compact subset, 
the Riemann integral C = /f, x |f()|dx is finite. Therefore, the same ideas allow 
us to consider a much larger space of functions on £2 as distributions, namely the 
space L},.(S2) of all locally integrable functions on 2. Lj,.(S2) is the space of 
all (equivalence classes of) Lebesgue’s measurable functions on $2 for which the 
Lebesgue integral 


iii I nore (3.5) 


is finite for every compact subset K C §2. Thus, the map / can be extended to a 
map I : L} (82) + D'(Q) by the same formula: For every f € L}, (2) define 


loc loc 


Ip: D(2) > K by 


I (¢) = / f(x)o@)dx = Vee D2). 


The bound |/¢(@)| < | flix Pxo(@) for all 6 € Dx(2) proves as above that Ir € 
D'(2) for all f € L},.(S2). A simple argument implies that 7 is a linear map 
and in the Exercises we prove that J is injective, ie., 77 = 0 in D'(Q) if, and 
only if, f = Oin Lige(@). Therefore, 7 is an embedding of i} 2) into D’(22). 
The space L/,.(§2) is an HLCTVS when it is equipped with the filtering system of 
seminorms {|-1,x| TKR, compact}. With respect to this topology, the embedding 
I is continuous in the following sense. If (f;) jen is a Sequence which converges to 
zero in Lige(@), then, for every 6 € D(&2), one has lim jo I() = 0 which 
follows easily from the bound given above. We summarize our discussion as the 


so-called embedding theorem. 


Theorem 3.2. The space Lj,.(@) of locally integrable functions on an open 
nonempty set 2 Cc R" is embedded into the space D'(S2) of distributions on 92 
by the linear and continuous injection I. The image of L} (2) under I is called the 


loc 
space of regular distributions on 2: 
Dieg(2) = I Lige(@)) C D'(Q). (3.6) 


Note that under the identification of f and J we have established the following 
chain of relations: 


CQIEL (QSL (Oye D(2) 


for any r > 1, since for r > 1 the space of measurable functions f on §2 for which 
| f|" is locally integrable is known to be contained in L}, (2). 


loc 


3.2 Definition of Distributions 31 
3.2.2. Some Standard Examples of Distributions 
3.2.2.1 Dirac’s Delta Distribution 


For any point a € 2 C R” define a functional 5, : D(2) > K by 
dalP)= (a2) VbED(). 


Obviously 5, is linear. For any compact subset K C £2, one has the following 
estimate: 

lda(@)| < Cla, K)pxo(d) = Vee Dx(&2) 
where the constant C(a, K) equals | ifa € K and C(a, K) = 0 otherwise. Therefore, 
the linear functional 5, is continuous on D(2) and thus a distribution. Its order 


obviously is zero. In the Exercises it is shown that 6, is not a regular distribution, 
ie., there isno f € L! (Q) such that 6,(@) = f S(x)o(x)dx for all @ € D(2). 


loc 


3.2.2.2 Cauchy’s Principal Value 


It is easy to see that x hb 4 is not a locally integrable function on the real line 
R, hence J: does not define a regular distribution. Nevertheless, one can define a 
distribution on R which agrees with J: on R\ {0}. This distribution is called Cauchy’s 
principal value and is defined by , 


wpe, ¢) = lim BO i (3.7) 


pat Haier’ “0 


We have to show that this limit exists and that it defines a continuous linear functional 
on D(R). For a > 0 consider the compact interval K = [— a,a]. TakeO <r <a 
and calculate, for all ¢ € Dx(R), 


OO ay = ow oe *) ay 


|x|2r 
If we observe that (x) — 6(—x) = x (i o'(xt)dt, we get the estimate 


a — &(—x) 
xX 


< 2sup |¢'(y)| < 2px), 
yeK 


and thus | f° 2°-* gx| < 2apx,1(¢) uniformly in0 < r < a, forall € Dx(R). 
It follows that this limit exists and that it has the value: 


i OO ax =)" ewer 2 
30 |x|=r 


Furthermore, the continuity bound 


1 
(vp P| SKI pK.1@) 


32 3 Schwartz Distributions 


for all 6 € Dx(R) follows. Therefore, vp+ is a well-defined distribution on R 
according to Theorem 3.1. Its order obviously is 1. 

The above proof gives the following convenient formula for Cauchy’s principal 
value: 


[ ee ee *) a (3.8) 


wp »$) = 

Test functions in D(R\ {0} ) have the property that they vanish in some neighbor- 

hood of the origin (depending on the function). Hence, for these test function the 
singular point x = 0 of + is avoided, and thus it follows that 


lim Oe | CON (11,4) Vb € D(R\ {0}). 


r>0 lxl>r R «x 


Sometimes one also finds the notation vp te 209 dx for (vp, o). The letters “vp” in 
the notation for Cauchy’s principal value stand for the original French name “valeur 
principale.” 


3.2.2.3 Hadamard’s Principal Values 


Closely related to Cauchy’s principal value is a family of distributions on i which 
can be traced back to Hadamard. Certainly, for 1 < 6 < 2 the function 4 is not 
locally integrable on R*. We are going to define a distribution T on Rt which agrees 
on Rt \ {0} = (0, 00) with the regular distribution /,-s. For all ¢ €¢ D(R) define 


Bice i (x) ~ 0) 5 
0 


Since again 6(x) — @(0) = x iA ¢'(xt)dt we can estimate 


0 
or is < |x|! og a(@) 


if @ € Dx(R). Since now the exponent y = 1 — is larger than —1, the integral exists 
over compact subsets. Hence, T is well defined on D(R). Elementary properties of 
integrals imply that T is linear and the above estimate implies, as in the previous 
example, the continuity bound. Therefore, T is a distribution on R. 

If @ € D(R\ {0} ), then in particular g(x) = O for all x € R, |x| < r for some 
r > 0, and we get (7,6) = i oO dx = T4, (). Hence, on R\ {0} the distribution 
T is regular. j 

Distributions like Cauchy’s and Hadamard’s principal values are also called 
pseudo functions, since away from the origin x = 0 they coincide with the cor- 
responding regular distributions. Thus, we can consider the pseudo functions as 
extensions of the regular distributions to the point x = 0. 


3.3. Convergence of Sequences and Series of Distributions 33 


3.3 Convergence of Sequences and Series of Distributions 


Often the need arises to approximate given distributions by “simpler” distributions, 
for instance functions. For this one obviously needs a topology on the space D’(Q2) 
of all distributions on a nonempty open set 2 C R”. A topology which suffices for 
our purposes is the so-called weak topology which is defined on D’(2) by the system 
of seminorms P, = {0% :GE D(2)}. Here pg is defined by 


pp(T) = |(T,6)| =|T@)| forall T € D'(£2). 


This topology is usually denoted by o = o(D’, D). 

If not stated explicitly otherwise we consider D’(2) always equipped with this 
topology o. Then, from our earlier discussions on HLCTVS, we know in principle 
what convergence in D’ means or what a Cauchy sequence of distributions is. For 
clarity we write down these definitions explicitly. 


Definition 3.3. Let 2 C R" be open and nonempty and let (7;) jen be a sequence 
of distributions on &, i.e., a sequence in D’(2). One says: 


1. (Tj)jen converges in D’(S2) if, and only if, there is a T € D’(Q2) such that for 
every @ € D({2) the numerical sequence (7Tj(#)) jen converges in K to T(@). 

2. (Tj)jen is a Cauchy sequence in D’({2) if, and only if, for every ¢ € D({2) the 
numerical sequence (T)(#)) jen is a Cauchy sequence in K. 


Several simple examples will illustrate these definitions and how these concepts 
are applied to concrete problems. All sequences we consider here are sequences of 
regular distributions defined by sequences of functions which have no limit in the 
sense of functions. 


Example 3.1 


1. The sequence of C°-functions fj(x) = sin jx on R certainly has no limit in the 
sense of functions. We claim that the sequence of regular distributions T; = I; 
defined by these functions converges in D’(R) to zero. For the proof take any 
@ € D(R).A partial integration shows that 


1 
(T;,o) = J sincinocoas = = | coscix' xray 


and we conclude that lim ;_,.. (Tj, @) = 0. 
2. Delta sequences: 5-sequences are sequences of functions which converge in D’ 
to Dirac’s delta distribution. We present three examples of such sequences. 
a) Consider the sequence of continuous functions t;(x) = sn) and denote 
T; = I1,. Then 
lim T; = 76 in D’(R). 


jrow 


34 


b 


Cc 


) 


wm 


3 Schwartz Distributions 


For the proof take any @ € D(R). Then the support of ¢ is contained in [—a, a] 
for some a > 0. It follows that 


(T.9) = [27 SP onde 
= [2 P16) — GOIdx + [2f MG Ode. 


As in the first example, one shows that 


+a o: +a = 
[PP toe) - oopae =F [cos in pax 
Le x Pha dx x 


a 


converges to zero for 7 — oo. Then recall the integral: 


+4 cin (jx +34 gin + sin 
/ an OD fy = i dy > joo / Edy =. 


a x ja y oo y 


We conclude that lim j_..0(7;,¢) = 2 @(0) for every @ € D(R) which proves 
the statement. 
Take any nonnegative function f € L'(R”) with = f(x)dx = 1. Introduce 
the sequence of functions f;(x) = j” f(jx) and the associated sequence of 
regular distributions 7; = /,. We claim: 

lim T; = 6 in D’(R"). 


jroo 


The proof is simple. Take any ¢ € D(R") and calculate as above, 


(Tj) = fon fi@)b(r)dx 
= fen LOMO) — PO) dx + fan fF )GO)dx. 


To the first term 
he FiO) — OCO)]dx = in J” fGxMe(x) — b(0)]dx 


= i FOO) — 6O)]dy 
Re J 


we apply Lebesgue’s dominated convergence theorem to conclude that the 
limit 7 — o of this term vanishes. For the second term note that 
Jpn f)()dx = fon f(y)dy = 1 for all j € N and we conclude. 

As a special case of this result we mention that we can take in particular 
f € D(R"). This then shows that Dirac’s delta distribution is the limit in D’ 
of a sequence of C*-functions of compact support. 

For the last example of a delta sequence we start with the Gauss function on 
R": g(x) = (ry te. Certainly 0 < g € L'(R”) and thus we can proceed 
as in the previous example. The sequence of scaled Gauss functions gj;(x) = 


3.3 Convergence of Sequences and Series of Distributions 35 


J” g(jx) converges in the sense of distributions to Dirac’s delta distribution, 
ie., for every @ € D(R"): 


Mat Vee) = $(0) = (5,¢). 


This example shows that Dirac’s delta can also be approximated by a sequence 
of strongly decreasing C®-functions. 
3. Now we prove the Breit-Wigner formula. For each ¢ > 0 define a function 
te > Rby 


f(x) € I 1 i 1 1 
eX) = = im he z ; : 
; x2 + @2 x—-ie 2|x+ie x-—ie 
We claim that 
lim fe = a8 in D’(R). (3.9) 

Often this is written as 

lim =7156 

e>0 x2 + e2 


(Breit-Wigner formula). 
This is actually a special case of a delta sequence: The function h(x) = ae 
satisfies 0 < h € L'(R) and ie h(x)dx = 7. Thus, one can take hj(x) = 
Jh(jx) = f-(x) fore = 7 and apply the second result on delta sequences.. 

4. Closely related to the Breit-Wigner formula is the Sokhotski-Plemelji formula. 
It reads 


lim - 
e>0xX 21 


1 
= Fiwd + vp — in D’(R). (3.10) 
x 


Both formulas are used quite often in quantum mechanics. 
For any € > 0 we have 


1 1 
—=Re —+ilm : 
Xo 1Eé Xx 1E Xu 1é 
where 
we (x) 
e = o(X), 
x + ie x2 +4 2 8 
Im— oF fx) 
m = = 7 f,.(x). 
x tie x2 4+ 6? 


The limit of f, for ¢ —> 0 has been determined for the Breit—Wigner formula. To 
find the same limit for the functions g, note first that g, is not integrable on R. It 


36 3 Schwartz Distributions 


is only locally integrable. Take any @ € D(R) and observe that the functions g, 
are odd. Thus, we get 


(Ig..) = [ sacoroooax = 8e(x)[P(x) — O(—x) ]dx. 
Rewrite the integrand as 


(x) — O(—-x) 


xX 


Se(x P(x) — O(—x)] = xge(x) 


and observe that the function ee belongs to L'(R) while the functions 
xg-(x) are bounded on R by | and converge, for x 4 0, pointwise to 1 as e > 0. 
Lebesgue’s dominated convergence theorem thus implies that 


lim / ge(x)@(x)dx = / “GOVE OM) a 
é>0 JR 5 2 


or 


x 1 

lim ———~ = vp— in D'(R 3.11 
Jean 2. (R) (3.11) 
where we have taken Eq. (3.8) into account. Equation (3.11) and the Breit-Wigner 
formula together imply easily the Sokhotski—Plemelj formula. 


These concrete examples illustrate various practical aspects which have to be ad- 
dressed in the proof of convergence of sequences of distributions. Now we formulate 
a fairly general and powerful result which simplifies the convergence proofs for 
sequences of distributions in an essential way: It says that for the convergence 
of a sequence of distributions, it suffices to show that this sequence is a Cauchy 
sequence, i.e., the space of distributions equipped with the weak topology is sequen- 
tially complete. Because of the great importance of this result we present a detailed 
proof. 


Theorem 3.3 Equip the space of distributions D'() on an open nonempty set 
Q C R" with the weak topologyo = o(D'(2), D(2)). Then D'(&) is a sequentially 
complete HLCTVS. 

In particular, for any sequence (T;)iexn C D'(Q) such that for each ¢ € D(@) 
the numerical sequence (T;(¢));en converges, there are, for each compact subset 
K C &, aconstant C and an integer m € N such that 


IT;()| < Cpxm) Vbe Dx(82), Vie N; (3.12) 


i.e., the sequence (T;)ien is equicontinuous on Dx (2) for each compact set K C Q2. 


Proof Since its topology is defined in terms of a system of seminorms, the space 
of all distributions on (2 is certainly a locally convex topological vector space. Now 
given T € D'(2), T # 0, there is ad € D(L2) such that T(¢) F 0, thus py(T) = 


3.3 Convergence of Sequences and Series of Distributions 37 


|T(@)| > O and Proposition 2.2 implies that the weak topology is Hausdorff, hence 
D'(2) is an HLCTVS. 

In order to prove sequential completeness, we take any Cauchy sequence (7; )jen 
in D’(Q) and construct an element T € D’({2) to which this sequence converges. 

For any ¢ € D(S2) we know (by definition of a Cauchy sequence) (7;(@));cn to be 
a Cauchy sequence in the field IK which is complete. Hence, this Cauchy sequence of 
numbers converges to some number which we call T(@). Since this argument applies 
to any @ € D(2), we can define a function T : D(2) > K by 


T@) = lim T(@) Vee DL). 


Since each 7; is linear, basic rules of calculation for limits of convergent sequences 
of numbers imply that the limit function T is linear too. 

In order to show continuity of this linear functional T it suffices, according to 
Theorem 3.1, to show that Tx = T|Dx(§2) is continuous on Dx(2) for every 
compact subset K C 2. This is done by constructing a neighborhood U of zero in 
Dx(&2) on which T is bounded and by using Corollary 2.1 to deduce continuity. 

Since 7; is continuous on Dx (2), we know that 


Ui = {@ € Dk(2): |Ti(P)| < YB 


is aclosed absolutely convex neighborhood of zero in Dx ({2) (see also the Exercises). 
Now define 
U =U; 


and observe that U is a closed absolutely convex set on which the functional T is 
bounded by 1. Hence, in order to deduce continuity of T, one has to show that U is 
actually a neighborhood of zero in Dx (S2). This part is indeed the core of the proof 
which relies on some fundamental properties of the space Dx (2) which are proven 
in the Appendix. 

Take any @ € Dx({2); since the sequence (7;(¢));en converges, it is bounded 
and there is ann = n(@) € N such that |7;(@)| < n for alli € N. It follows that 
|T()| = limj-..0 |T;(@)| < n and thus é =n - 1g e€ nu. Since ¢@ was arbitrary in 
Dx(&), this proves 

Dx(Q2) = UP nU. 


In Proposition 2.4 it is shown that Dx ({2) is a complete metrizable HLCTVS. Hence 
the theorem of Baire (see Appendix, Theorem C.3) applies to this space, and it 
follows that one of the sets nU and hence U itself must have a nonempty interior. 
This means that some open ball B = ¢p + Bp, = do + {6 € Dx(@) : p(g) < r} is 
contained in the set U. Here ¢o is some element in U, r some positive number and 
P = PxK.m iS some continuous seminorm of the space Dx (S2). Since T is bounded 
on U by | it is bounded on the neighborhood of zero B,, by 1 + |T(@o)| and thus T 
is continuous. 


38 3 Schwartz Distributions 


All elements of 7; and the limit element T are bounded on this neighborhood U 
by 1. From the above it follows that there are a constant C and some integer m € N 
such that 

IT:()| < CpKm) Vee Dx(2), Vie N; 


i.e., the sequence (T;);en is equicontinuous on Dx (2) for each compact set K C £2, 
and we conclude. 

The convergence of a series of distributions is defined in the usual way through 
convergence of the corresponding sequence of partial sums. This can easily be 
translated into the following concrete formulation. 


Definition 3.4 Given a sequence (7;);<y of distributions on a nonempty open set 
92 C R" one says that the series )°,_, 7; converges if, and only if, there is a 
T € D'(Q2) such that for every @ € D(2) the numerical series )°, en Ji(@) converges 
to the number 7(@). 

As a first important application of Theorem 3.3, one has a rather convenient 
characterization of the convergence of a series of distributions. 


Corollary 3.2 A series 0; -y T; of distributions T; € D'(Q) converges if, and only 
if, for every @ € D(&2) the numerical series Dien T;(@) converges. 

As a simple example consider the distributions 7; = cj5jq for some a > 0 and 
any sequence of numbers c;. Then the series 


~ CiSia 


ieN 
converges in D’(R). The proof is simple. For every ¢ € D(R) one has 


m 


YT) = do ciglia) = Yo cig tia) 


ieN ieN i=l 


for some m € N depending on the support of the test function @ (for ia > m the 
point ia is not contained in supp ¢). 


3.4 Localization of Distributions 


Distributions on a nonempty open set 2 C R” have been defined as continuous 
linear functionals on the test function space D(2) over 2 but not directly in points 
of §2. Nevertheless we consider these distributions to be localized. In this section we 
explain in which sense this localization is understood. 

Suppose 92; C 22 C R". Then every test function @ € D(2;) vanishes in 
a neighborhood of the boundary of £2; and thus can be continued by 0 to (22 to 
give a compactly supported test function ig,.2,(@) on 22. This defines a mapping 


3.4 Localization of Distributions 39 


ig,,a, | D(82;) — D(22) which is evidently linear and continuous. Thus, we can 
consider D({2;) to be embedded into D({22) as ig, ,.2,(D(Q@1)), Le. 


127,2,(D(21)) C D(&22). 


Hence, every continuous linear functional T on D(S22) defines also a continuous 
linear functional T 0 ig,.2, = P2,,2,(T) on D(S2;). Therefore, every distribution T 
on $22 can be restricted to any open nonempty subset {2; by 


T|2) = pa,,a,(T). (3.13) 


In particular this allows us to express the fact that a distribution T on (22 vanishes 
on an open subset 92): Pa,,9,(T) = 0, or in concrete terms 


Toimal@)=0 Vpe D2). 


For convenience of notation the trivial extension map ig,,9, is usually omitted and 
one writes 
T?)=0 Vo EDM) 


to express the fact that a distribution T on $22 vanishes on the open subset £2;. As a 
slight extension we state: Two distributions 7; and T, on 2) agree on an open subset 
§2, if, and only if, 

(2,,2(T\) = pa,a,(Tr) 


or in more convenient notation if, and only if, 
T\(¢) = Tr) Vo € DQ). 


The support of a function f : 82 — K is defined as the closure of the set of those 
points in which the function does not vanish, or equivalently as the complement of 
the largest open subset of S2 on which f vanishes. The above preparations thus allow 
us to define the support of a distribution T on S2 as the complement of the largest 
open subset 2; C 2 on which T vanishes. The support of T is denoted by supp 7. 
It is characterized by the formula 


suppT = {| A (3.14) 
AeCr 
where Cr denotes the set of all closed subsets of §2 such that T vanishes on §2\ A. 
Accordingly a point x € {2 belongs to the support of the distribution T on £2 if, and 
only if, T does not vanish in every open neighborhood U of x, i.e., for every open 
neighborhood U of x there is a @ € D(U) such that T(¢) # 0. 

In the Exercises one shows that this concept of support of distributions is com- 
patible with the embedding of functions and the support defined for functions, i.e., 
one shows 

supp J+ = supp f forall fe Line(@)- 


A simple example shows that distributions can have a support consisting of one point: 
The support of the distribution T on (2 defined by 


T() = D> caD* (x0) (3.15) 


la|<m 


40 3 Schwartz Distributions 


is the point x € §2, for any choice of the constants cy and any m € N. If adistribution 
is of the form (3.15) then certainly T(¢) = 0 for all @ € DCR” \ {xo} ) since such test 
functions vanish in a neighborhood of xo and thus all derivatives vanish there. And, 
if not all coefficients cy vanish, there are, in any neighborhood U of the point xo, test 
functions ¢ € D(U) such that T(¢) 4 0. This claim is addressed in the Exercises. 

Furthermore, this formula actually gives the general form of a distribution whose 
support is the point x9. We show this later in Proposition 4.7. 

Since we have learned above when two distributions on {2 agree on an open 
subset, we know in particular when a distribution is equal to a C°°-function, or more 
precisely when a distribution is equal to the regular distribution defined by a C™- 
function, on some open subset. This is used in the definition of the singular support 
of a distribution, which seems somewhat ad hoc but which has proved itself to be 
quite useful in the analysis of constant coefficient partial differential operators. 


Definition 3.5 Let T be a distribution on a nonempty open set 2 C R”. The 
singular support of 7, denoted sing supp 7, is the smallest closed subset of S2 in 
the complement of which T is equal to a C*-function. 

We mention a simple one dimensional example, Cauchy’s principal value vp 1. 
In the discussion following formula (3.8) we saw that vp + = M1 on R\ {0}. Since 
4 is a C-function on R\ {0}, sing supp vp + C {0}. And since {0} is obviously the 
smallest closed subset of IR outside which the Cauchy principal value is equal to a 
C-function, it follows that 


1 
sing supp vp — {O} . 


3.5 Tempered Distributions and Distributions with Compact 
Support 


Tempered distributions are distributions which admit the Fourier transform as an iso- 
morphism of topological vector spaces and accordingly we will devote later a separate 
chapter to Fourier transformation and tempered distributions. This section just gives 
the basic definitions and properties of tempered distributions and distributions with 
compact support. 

Recall the beginning of the section on the definition of distributions. What has 
been done there for general distributions will be done here for the subclasses of 
tempered and compactly supported distributions. 


Definition 3.6 A tempered distribution T on an open nonempty subset 2 C 
R” is a continuous linear functional on the test function space S(S2) of strongly 
decreasing C®-functions on (2. The set of all tempered distributions on (2 equals 
the topological dual S’(2) of S(2). 

In analogy with Theorem 3.1, we have the following explicit characterization of 
tempered distributions. 


3.5 Tempered Distributions and Distributions with Compact Support 41 


Theorem 3.4 A linear functional T : S(82) — K is a tempered distribution on the 
open nonempty set 2 Cc R" if, and only if, there exist anumber C € Rt and natural 
numbers m,k € N, depending on T, such that for all @ € S(&) the estimate 


IT(®)| < Cpmx(P) (3.16) 
holds. 


Proof Recall the definition of the filtering system of norms of the space S(£2) and the 
condition of boundedness for a linear function T : S(2) — K. Then it is clear that 
the above estimate characterizes T as being bounded on S({2). Thus, by Theorem 
2.4, this estimate characterizes continuity and we conclude. 

According to relation (3.2), we know that every tempered distribution is a dis- 
tribution and therefore all results established for distributions apply to tempered 
distributions. Also, the basic definitions of convergence and of a Cauchy sequence 
are formally the same as soon as we replace the test function space D({2) by the 
smaller test function space S({2) and the topological dual D’(2) of D() by the 
topological dual S’(2) of S(@). Hence, we do not repeat these definitions, but we 
formulate the important counterpart of Theorem 3.3 explicitly. 


Theorem 3.5 Equip the space of distributions S'(S2) of tempered distributions on 
an open nonempty set 82 © R" with the weak topology 0 = o(S'(&82), S(@)). Then 
S'(92) is a sequentially complete HLCTVS. 


Proof As in the proof of Theorem 3.3 one sees that S’() is an HLCTVS. By 
this theorem one also knows that a Cauchy sequence in S’({2) converges to some 
distribution T on £2. In order to show that T is actually tempered, one proves that T 
is bounded on some open ball in S({2). Since S(2) is a complete metrizable space 
this can be done as in the proof of Theorem 3.3. Thus we conclude. 

Finally, we discuss briefly the space of distributions of compact support. Recall 
that a distribution T € D’(2) is said to have a compact support if there is a compact 
set K C 2 such that T(¢) = 0 for all 6 € D(2\K). The smallest of the compact 
subsets K for which this condition holds is called the support of T , denoted by supp T. 
As we are going to explain now, distributions of compact support can be characterized 
topologically as elements of the topological dual of the test function space €(S2). 
According to (C.3), the space E({2) is the space C*(2) equipped with the filter- 
ing system of semi-norms P,.(§2) = {PKm : K C 82 compact, m = 0,1,2,... }. 
Hence a linear function T : €() — K is continuous if, and only if, there are a 
compact set K C @,aconstant C € R* and an integer m such that 


IT@)| < Cpxm@G) VPe EX). (3.17) 


Now suppose T € €'(§2) is given. Then T satisfies condition (3.17) and by relation 
(3.2) we know that T is a distribution on 92. Take any ¢ € D(QQ — K). Then } 
vanishes in some open neighborhood U of K and thus D*¢(x) = 0 for all x € K 
and alla € N”. It follows that px »(@) = 0 and thus T(¢) = 0 for all é € D(Q\K), 
hence supp T C K. This shows that elements in €’(2) are distributions with compact 
support. 


42 3 Schwartz Distributions 


Conversely, suppose that T € D’({2) has a support contained in a compact set 
K Cc &. There are functions u € D({2) which are equal to | in an open neighborhood 
of K and which have their support in a slightly larger compact set K’ (see Exercises). 
It follows that (1—u)-¢ € D(2\ K) and therefore T((1—u)-¢) = Oor T(¢) = T(u-d) 
for all 6 € D(&). For any Ww € E(2), one knows u- Ww € Dx (2) and thus 
To(w) = T(u- yy) is a well-defined linear function €(2) > K. (If v € D(2) 
is another function which is equal to | in some open neighborhood of K, then 
u-w—v-w € D(2\K) and therefore Tu. w—v-w) = 0). Since T is a 
distribution, there are a constant C € R* and m € N such that |T(@)| < cpxm(@) 
for all 6 € Dx (2). For all & € E(L) we thus get 


|To(w)| = |T(u H WI < CPx m(u 7 wv) < Cp Km) PK mY). 


This shows that 7p is continuous on €(2), i.e. Ty € E’(2). On D(2) the functionals 
Ty and T agree: To(@) = T(u- d) = T(@) for all ¢ € D(L) as we have seen above 
and therefore we can formulate the following result. 


Theorem 3.6 The topological dual E'(2) of the test function space E(&2) equals 
the space of distributions on $2 which have a compact support. Equipped with the 
weak topology 0 = o(E'(Q), E(&2)) the space E'(2) of distributions with compact 
support is a sequentially complete HLCTVS. 


Proof The proof that €’(£2) is a sequentially complete HLCTVS is left as an exercise. 
The other statements have been proven above. 


3.6 Exercises 


1. Let f : 2 — R be a continuous function on an open nonempty set 22 C R”. 
Show: If J f(x)o(x)dx = 0 for all d € D(82), then f = 0, ie., the map 
I: C(&) > D'(&) of Theorem 3.2 is injective. Deduce that / is injective on all 

of Ld AO), 

2. Prove: There is no f € LL AQ) such that 6,(¢@) = f S(x)o(x)dx for all & € 
D(2). 

Hint: It suffices to consider the case a = 0. Then take the function p : R" > R 
by 
0 : for |x| > 1, 
P(x) = 


-1 
el? : for |x| < 1, 


and define p,;(x) = p(s) for r > 0. Recall that p, € D(§2) and p,(x) = 0 for all 
x € R” with |x| > r. Finally, observe that for f € L},.(2) one has 


lim | f(x)|dx = 0. 


130 Jeslx<r 


3.6 Exercises 43 


3. 


Consider the hyperplane H = {x = (x,...,X,) € R": x, = 0}. Define a 
function 6, : D(R") > K by 


(64,0) = / PO, X2,... ,X,)dx2--- dx, Voe DR’). 
Rr-l 


Show that 6,, is a distribution on R”. It is called Dirac’s delta distribution on the 
hyperplane H. 


. For any point a € 82 C R", £2 open and not empty, define a functional T : 


D(2) > K by 


n 


a2 
T.6)=S@=ao@ YoeDR", 


i=1 


Prove: T is a distribution on (2 of order 2. On §2\ {a} this distribution is equal to 
the regular distribution J) defined by the zero function. 


. Let S,_) = {x € R" : )_, x7 = 1} be the unit sphere in R” and denote by do 


the uniform measure on S,_,. The derivative in the direction of the outer normal 
of S,—1 is denoted by 2. Now define a function T : D(R") > K by 


(T,o) =) ae Vo € D(Q) 
Sit on 


and show that T is a distribution on R” of order 1 which is equal to the regular 
distribution Jj on R”\S,_1. 


. Given a Cauchy sequence (7; );< of distributions on a nonempty open set 2 C R’, 


prove in detail that the (pointwise or weak) limit T is a linear function D({2) > K. 


. Let X[P] be an HLCTVS, T € X'[P] andr > 0. Show: 


U={xeX: |TO)| <r} 


is a closed absolutely convex neighborhood of zero. 


Chapter 4 
Calculus for Distributions 


This chapter deals with the basic parts of calculus, i.e., with differentiation of 
distributions, multiplication of distributions with smooth functions and with other 
distributions, and change of variables for distributions. There are other parts which 
will be addressed in separate chapters since they play a prominent role in distribu- 
tion theory, viz., Fourier transform for a distinguished subclass of distributions and 
convolution of distributions with functions and with other distributions. 

It stands to reason that we define differentiation, multiplication, and variable 
transformations for distributions, we insist that these definitions be consistent with 
these operations on functions and the embedding of functions into the space of 
distributions. 

As preparation we mention a small but important observation. Let 22 C R” be 
nonempty and open and A : D(§2) — D(S2) a continuous linear function of the 
test function space on S2 into itself. Such a map induces a map on the space of 
distributions on 2: A’ : D'(Q) > D’'(Q) according to the formula 


A(T)=ToA YWTED (SR). (4.1) 


As a composition of two linear and continuous functions, A’(7) is a continuous 
linear function D(92) — K and thus a distribution. Therefore, A’ is well defined 
and is called the adjoint of A. Obviously A’ : D'(2) > D’() is linear, but 
it is also continuous, since for every 6 € D() we have, for all T € D’(2), 
py(A(T)) = pagy(L) so that Definition 2.7 and Theorem 2.4 imply continuity. 

The adjoint itself (or a slight modification thereof in order to ensure consis- 
tency with the embedding of functions) will be used to define differentiation of 
distributions, their multiplication and change of variables. 


© Springer International Publishing Switzerland 2015 45 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_4 


46 4 Calculus for Distributions 


4.1 Differentiation 


Let D® be a derivative monomial of order a = (@),...,@,) € N”. It is certainly 
a linear map D(2) > D(2) for any open nonempty set 2 C R”. Continuity of 
D* : D(82) > D() follows easily from the estimate 


PKm(D"$) S Pxm+ailo) Vb e Dx(2) 


in conjunction with Definition 2.7 and Theorem 2.4. Therefore, the adjoint of the 
derivative monomial D® is a continuous linear map D’(Q) — D'(2) and thus 
appears to be a suitable candidate for the definition of the derivative of distributions. 
However, since we insist on consistency of the definition with the embedding of 
differentiable functions into the space of distributions, a slight adjustment has to be 
made. To determine this adjustment take any f € C!(R”) and calculate for every 
¢ € DR’), 


0 0 
lnc pie ii oT puja = | re / ( oF (s)px)dn) Aoi 
R" OX] R OX, 


0 
--f-f( F0)°* co} de ie: 
R Xx] 


Here we use the abbreviation 0; = a Similarly, by repeated partial integration, 
one obtains (see Exercises) for f € C*(R"), 


(Ip«¢,¢) = (— 1I)"(Ip, D®6) VG Ee DR"), Va eN", lal <k. 


Denoting the derivative of order a on D'(§2) with the same symbol as for functions, 
the condition of consistency with the embedding reads 


D*Ip =Ipap Va EN", lal<k, Vf €C*(Q). 


Accordingly, one takes as the derivative monomial of order a on distributions the 
following modification of the adjoint of the derivative monomial on functions. 


Definition 4.1 The derivative of order a = (a,...,a@,) € N” is defined on D’(22) 
by the formula 
D°T =(—1)"'ToD” VT eED(&), 


ie., for each T € D’(§2) one has 


(D°T,¢) =(— 1)!""(T,D*¢) — ¥H E D(2). 


There are a number of immediate powerful consequences of this definition. The proof 
of these results is straightforward. 


4.1 Differentiation 47 


Theorem 4.1 Differentiation on the space of distributions D'(&2) on a nonempty 
open set 82. C R" as defined in Definition 4.1 has the following properties: 


1. Every distribution has derivatives of all orders and the order in which derivatives 
are calculated does not matter, 1.é., 


D*(D'T) = D§(D°T) = D*?T = WT ED (R), Va,p EN". 


2. The local order of a distribution increases by the order of differentiation. 
. Differentiation on D'(&2) is consistent with the embedding of C°(&2) into D'(2). 
4. The derivative monomials D® : D'(Q) — D'(Q) are linear and continuous, 
hence in particular 
a) If T = limj-.0 T; in D'(2), then 


ioe) 


D*( lim T;) = lim D*T;. 
1 0O 1—>0O 
b) Ifa series ¥°;-~ T; converges in D'(22), then 


Dy ie Dd, 


ieN ieN 


Proof The first part has been shown in the definition of the derivative for distri- 
butions. The order of differentiation does not matter since on C®({2) the order of 
differentiation can be interchanged. 

If for some compact set K C $2 we have |T(@)| < Cpr. m(@) for all @ € Dx (2), 
we get |D*T(#)| = |T(D%9)| < Cpxn(D°$) < CPx m+la\() and the second part 
follows easily from the definition of the local order (Definition 3.2). 

The consistency of the derivative for distributions with the embedding of 
differentiable functions has been built into the definition. 

Since the derivative D* on D’(2) equals the adjoint of the derivative on functions 
multiplied by (— 1)!@!, the continuity of the derivative follows immediately from that 
of the adjoint of the linear continuous map on D(S2) as discussed after Eq. (4.1). 


Remark 4.1 


1. Obviously, the fact that every distribution has derivatives of all orders comes 
from the definition of the test function space as a subspace of the space of all 
C®-functions and the definition of a topology on this subspace which ensures 
that all derivative monomials are continuous. 

2. In the sense of distributions, every locally integrable function has derivatives of 
all orders. But certainly, in general, the result will be a distribution and not a 
function. We mention a famous example. Consider the Heaviside function # on 
the real line R defined by 


for x <0, 
O(x) = 
for x>0. 


48 4 Calculus for Distributions 


@ is locally integrable and thus has a derivative in the sense of distributions which 
we calculate now. For all ¢ € DCR) one has 


(DIo,¢) = —(Io, Dd) = — fz Ox)" (x) dx 
= — fo’ $'(a)dx = $(0) = (5,9). 


This shows that DIg = 6, which is often written as 
6'(x) = 6(x), 


i.e., the derivative (in the sense of distributions) of Heaviside’s function equals 
Dirac’s delta function. 
Some other examples of derivatives are given in the Exercises. 

3. Part4 of Theorem 4.1 represents a remarkable contrast to classical analysis. Recall 
the example of the sequence of C°-functions fj(x) = : sin(jx) on R which 
converges uniformly on R to the C®-function 0, but for which the sequence of 
derivatives f y (x) = cos (jx) does not converge (not even pointwise). In the sense 
of distributions the sequence of derivatives also converges to 0: For all ¢ € D(R) 
we have 


1 
(DI¢,,@) = —Uz,,D) = —5 [si (jx)'(x)dx > 0 


as j > OO. 


One of the major goals in the development of distribution theory was to get a 
suitable framework for solving linear partial differential equations with constant co- 
efficients. This goal has been achieved [6, 7]. Here we mention only a few elementary 
aspects. Knowing the derivative monomials on D’({2) we can consider linear con- 
stant coefficient partial differential operators on this space, i.e., operators of the 
form 


PD = Y aD* (4.2) 
la|<k 
with certain coefficients a, € K andk = 1,2,.... Now given f € C™(S2) we can 
consider the equation 
P(D)u= f (4.3) 


in two ways: A classical or strong solution is a function u € C*(Q) such that 
this equation holds in the sense of functions. A distribution T € D’(§2) for which 
P(D)T = I; holds in D'(£2) is called a distributional or weak solution. 

Since the space of distributions D’({2) is much larger than the space C*(Q) of 
continuously differentiable functions, one expects that it is easier to find a solution 
in this larger space. This expectation has been proven to be correct in many impor- 
tant classes of problems. However in most cases, in particular in those arising from 


4.2 Multiplication 49 


physics, one does not look for a weak but for a classical solution. So it is very im- 
portant to have a theory which ensures that for special classes of partial differential 
equations the weak solutions are actually classical ones. The so-called elliptic reg- 
ularity theory provides these results also for “elliptic” partial differential equations 
(see Part III). Here we discuss a very simple class of examples of this type. 


Proposition 4.1 Suppose T € D'(R) satisfies the constant coefficient ordinary 
differential equation 
D'T=0 in D‘(R). 


Then T is a polynomial P,_; of degree < n — 1, i.e, T = Ip,_,. Hence, the sets of 
classical and of distributional solutions of this differential equation coincide. 


Proof The proof is by induction on the order n of this differential equation. Hence, 
in a first step, we show: If a distribution T € D(R) satisfies DT = T’ = 0, then T 
is a constant, i.e., of the form T = J, for some constant c. 

Choose some test function yY € DCR) which is normalized by the condition 
(wv) = as w(x)dx = 1. Next consider any test function ¢ € DCR). Associate with it 
the auxiliary test function x = ¢ — I(¢)w which has the property /(x) = 0. Hence, 
x is the derivative of a test function p defined by p(x) = see x(y)dy, p' = x (see 
Exercises). T’ = 0 in D’(R) implies that 


T(x) = T(p') = -T'(p) = 0 


and therefore 


T(o) = TW) = I) 


with the constant c = T(1). 

Now suppose that the conclusion of the proposition holds for some n > 1. We are 
going to show that then this conclusion also holds for n + 1. 

Assume D’t!T = 0 in D’(R). It follows that D(D"T) = 0 in D’(R) and hence 
D"T = I, for some constant c. In the Exercises, we show the identity J. = D” Ip, 
where P,, is a polynomial of degree n of the form P,,(x) = a + P,,_1(x). Here P,,_; 
is any polynomial of degree < n — 1. Therefore, D’T = D" Ip, or D"(T — Ip,) = 0 
in D’(R). The induction hypothesis implies that 


T —t Tp, = To, 


for some polynomial Q,,_; of degree < n— 1 and we conclude that T is a polynomial 
of degree < n. 

In the Exercises, we will also show that any classical solution is also a 
distributional solution. 


4.2 Multiplication 


As is well known from classical analysis, the (pointwise) product of two continuous 
functions f,g € C™(&2), defined by (f - g)(x) = f(x)g(x) for all x € Q, is againa 
continuous function on £2. Similarly, the product of two continuously differentiable 


50 4 Calculus for Distributions 


functions f,g € C!(Q) is again a continuously differentiable function, due to the 
product rule of differentiation. 

However, the product of two locally integrable functions f,g € Lj,,.(@) is in 
general not a locally integrable function. As a typical case we mention: f - g is not 
locally integrable when both functions have a sufficiently strong singularity at the 


same point. A simple example is the function 


+oo : for x=0, 


f@={", 
are for x £0. 


Obviously f € Linc(R)s but f - f = f? is not locally integrable. Nevertheless, the 
product of two locally integrable functions which have a singularity at the same point 
will be locally integrable if these singularities are sufficiently weak; for example, 


take the function 
for x =0, 


1: for x #0. 


|x|? 


g(x) = 


for some exponent s > 0. If 2s < 1, then g? is locally integrable on R. 

On the other hand, there are many subspaces of L/,,.(2) with the property that any 
element in this subspace can multiply any element in L pe) such that the product is 
again in LEAD) (82 C R” open and nonempty), for instance the subspace C™(2) of 
continuous functions on 2 or the bigger subspace L?°.(2) of those functions which 
are essentially bounded on every compact subset K C Q2. 

These few examples show that in spaces of functions whose elements can have 
singularities the multiplication cannot be done in general. Accordingly, we cannot 
expect to have unrestricted multiplication in the space D’(2) of distributions and 
therefore only some special but important cases of multiplication for distributions 
are discussed. 


Proposition 4.2 In the space D'(&2) of distributions on a nonempty open set 2. C 
R", multiplication with C~-functions is well defined by 


(u-T)\o)=TUu-¢) VoeD®2) 


for every T € D'(82) and every u € C™®(82). This product has the following 
properties: 


1. For fixed u € C®() the map T +> u-T is linear and continuous on D'(&2). 
2. The product rule of differentiation holds: 


a F aT 
Se ae 2 ere (4.4) 
Ox; OX; 


ax; 


forall j =1,...,n, allu € C*(Q), and all T € D'(22). 


4.2 Multiplication 51 
3. This multiplication is compatible with the embedding of functions, i.e., 
u-: if = Luf 


for allu € C®(Q2) and all f € LE AQ). 
Proof For each u € C™(S2) introduce the mapping M, : D(2) > D() which 
multiplies a test function @ with the function u: M,(@) = u- ¢ (pointwise product). 
Obviously we have supp u- @ C supp@. Hence, M,(¢) has a compact support. 
The product rule of differentiation for functions shows that M,(@) € C°(), and 
therefore M,(¢) € D(S2) and M, is well defined. Clearly, M, is a linear map 
D(2) > D(2). 

In order to prove continuity recall first the Leibniz formula 


! 
Df-a= ae V fig € CQ). (4.5) 
pty=a P°Y" 
Here we use the multi-index notation: For a = (q,...,a@,) € N” one defines 


a! = a!---a,! and addition of multi-indices is as usual component-wise. 
Now given any compact set K C 92 andm € N we estimate as follows, for all 
@ € Dk(&): 
PK m(M(9)) < CPK mU)PK m(). 


Here C is a constant depending only on m and n. The details of this estimate are left 
as an Exercise. Since u € C®(S2) we know that px »(u) is finite for every compact set 
K and every m € N. Proposition 2.6, thus, implies continuity of M,,. Therefore, its 
adjoint M/ is acontinuous linear map D’({2) > D’(2) (see the arguments following 


Eq. (4.1)). Hence, the multiplication with C®-functions u, 
u-T=ToM,=M'(T) 


acts continuously on D’({2). 

The proof of the product rule for differentiation is a straightforward calculation. 
Take any T € D'({2) and any u € C™(S2). Using the abbreviation 0; = oh we have 
for all d € D(Q), 


(dj(u-T),o) = —(u- T),0j¢) = —(T, udjp) = —(T, 0j(ud) — bdju) 
= —(T,d;(udb)) + (T, pdju) = (0;T, ud) + (T, bdju) 
(u-0;T,) + (dju-T,b) = (u- 0;T + d0ju-T,6¢), 


and the product rule follows. 
Finally, we prove compatibility of the multiplication for distribution with the 
multiplication for L;,.(S2) under the embedding J. As we have seen earlier, u- f € 


52 4 Calculus for Distributions 


Lj,,(&) for all u € C~(Q) and all f € Lj,.(S2). Thus, given f € L/,,.(@) and 


loc loc 


u € C™(82), we calculate, for all @ € D(2), 


(u- 17,6) = (p,ug) = / F(x )ua)brdx = Laps) 


and we conclude. 
This proposition shows that the multiplicator space for distributions on {2 is 
all of C™°(Q), ie., every T € D'(S2) can be multiplied by every u € C™(2) to 
give a distribution u- T on £2. In the case of tempered distributions, one has to take 
growth restrictions into account and accordingly the multiplicator space for tempered 
distributions on §2 is considerably smaller as the following proposition shows: 


Proposition 4.3 Denote by O,,(IR") the space of all C®-functions u on IR” such 
that for every a € N" there are a constant C and an integer m such that 


|D%u(x)| << CA+x2)? Vx eR’. 


Then every T € S'(R") can be multiplied by every u € O,(R") and u-T € S'(R"). 


Proof We have to show that multiplication by u € O,,(R"), @ + u- ¢ is a contin- 
uous linear map S(R") > S(R”). Using Leibniz’ formula, this is a straightforward 
calculation. For the details we refer to the Exercises. 


4.3. Transformation of Variables 


As in classical analysis it is often helpful to be able to work with distributions in 
different coordinate systems. This amounts to a change of variables in which the 
distributions are considered. Since in general distributions are defined through their 
action on test functions, these changes of variables have to take place first on the 
level of test functions and then by taking adjoints, on the level of distributions. 
This requires that admissible transformations of variables have to take test function 
spaces into test functions and in this way they are considerably more restricted than 
in classical analysis. 

Let 92, C Ri be a nonempty open set anda : 2, > Qy, 2y C Rj, a dif- 
ferentiable bijective mapping from £2, onto the open set 82, = o(§2,). Then the 
determinant of the derivative of this mapping does not vanish: det go # Oon 22,. 
We assume o € C™({2,). It follows that the inverse transformation o~! is a C®- 
transformation from §2, onto 2, and compact subsets K C $2, are transformed onto 
compact subsets o(K) in 2,. The chain rule for functions implies that ¢ o o~! is of 
class C°* on $2, for every @ € D(&2,). Hence 


gr goa! 


4.3 Transformation of Variables 53 


is a well-defined mapping D(2,) > D(&2,). In the Exercises, we show that this 
mapping is actually continuous. In the Exercises, we also prove that 


do7! 


| det 
dy 


LeCr2,s: 


The well-known formula for the change of variables in integrals will guide us to a 
definition of the change of variables for distributions, which is compatible with the 
embedding of functions into the space of distributions. Take any f € L},.(S2y) and 
calculate for all @ € D(2,), 


da7! 
dy 


[, feoveenar = [ f(y)o(o '(y))| det Idy, 


Le., (Ifoo,) = (| det ao |-I7,Po o~'). Accordingly, one defines the change of 


variables for distributions. 
Definition 4.2 Leto : 2, — 2, be a bijective C°-transformation from a 
nonempty open set 2, C R” onto a (nonempty open) set 2,. To every distribu- 


tion T on 2, = o(§2,) one assigns a distribution T o o of new variables on £2, 
which is defined in the following formula for the transformation of variables: 


do! 12 
ay |-T,pgoo) Vode D(2,). (4.6) 


(T 00,¢) = (|det 


Proposition 4.4 For the transformation of variables as defined above, the chain 
rule holds, i.e., ifT € D'(Q,)ando = (01,...,0n) is a bijective C® -transformation, 
then one has for j = 1,...,n, 


Dj(T 0a)= (ajo) (D;T) 00. 


i=1 


Proof Since we will not use this rule in an essential way we refer for a proof to the 
literature [1]. 

In applications in physics, typically, rather special cases of this general formula are 
used, mainly to formulate symmetry or invariance properties of the system. Usually 
these symmetry properties are defined through transformations of the coordinate 
space, such as translations, rotations, and Galileo or Lorentz transformations. We 
give a simple concrete example. 

Let A be a constant n x n matrix with nonvanishing determinant and a € R” 
some vector. Define a transformation 0 : R” — R” by y = o(x) = Ax +a for 
all x € R”. This transformation certainly satisfies all our assumptions. Its inverse is 
x =oa7!(y) = A7!(y — a) for all y € R” and thus ao = A7!. Given T € D’(R") 
we want to determine its transform under o. According to Eq. (4.6) it is given by 
(T 00,6) = |detA'|(T,6 007') for all 6 € D(R"). The situation becomes 


54 4 Calculus for Distributions 


more transparent when we write the different variables explicitly as arguments of the 
distribution and the test function: 


1 
((T 0 o)(x), (x)) = (T(Ax + a), O(x)) = Tdet alo» o(A'(y —a))). 
In particular for A = 1, (1, is the identity matrix in dimension 1) this formula 
describes translations by a € R”. With the abbreviations T,(x) = T(x + a) and 


goaly) = o(y — a) we have 
(Ta, 0) = (Ta(x), 6(&)) = (T(y), bay). 


Knowing what the translation of a distribution by a € R” is one can easily formulate 
periodicity of distributions: A distribution T € D’(R") is said to be periodic with 
period a € R" if, and only if, T, = T. 

Another interesting application of the translation of distributions is to define the 
derivative as the limit of difference quotients as is done for functions. One would 
expect that this definition agrees with the definition of the derivative for distributions 
given earlier. This is indeed the case as the following corollary shows. 


Corollary 4.1 Let T be adistribution on a nonempty open set 2 C R" anda € R" 
some vector. Denote by T, the translated distribution as introduced above. Then 


a Tra — T é , 
hn ; =a-DT in D(£2). (4.7) 
to 

Here DT = (0,T,...,0,T) denotes the distributional derivative as given in 


Definition 4.1. 


Proof Givena € R" and T € D’(2) choose any ¢ € D(). Then, fort € R,t #0 
and sufficiently small, we know that ¢,, € D({2) too. For these numbers t we have 


Tha — T ta 
(eet Se, 


t t 


In the Exercises, we show that 


lim fee =-a-Dgd in D(2). 
I> 


Here our notation isa- Dé = a,\0;6+---+a,0,¢. Using continuity of T on D(S2) 
in the first step and Definition 4.1 in the last step, it follows that 

2 Tha —T 
pg = —(T,a- D®@) = (a- DT, 9), 


t>0 


and thus we conclude. 


4.4 Some Applications 55 
4.4 Some Applications 


4.4.1 Distributions with Support in a Point 


Thus far we have developed elementary calculus for distributions and we have learned 
about the localization of distributions. This subsection discusses some related results. 
A first proposition states that the differentiation of distributions and the multiplication 
of distributions with C°-functions are local operations on distributions since under 
these operations the support is “conserved.” 


Proposition 4.5 Suppose 2 C R" is open and nonempty. Then: 


1. supp(D°T) © supp T for every T € D'(92) and every a € N". 
2. supp(u- T) C supp T for every T € D'(Q) and every u € C*(£2). 


The proof of these two simple statements is suggested as an exercise. Here we want 
to point out that in both statements the relation C cannot be replaced by =. This can 
be seen by looking at some simple examples, for instance take T = J, on R for some 
constant c # 0. Then supp T = R but fora > 1 we have D°T = Ipe, = Ip = 0. 
And for u(x) = x, u € C™(R), and T = 6 € D’(R) one has u- 6 = O while 
supp 6 = {O}. 

It is also instructive to observe that ¢(x) = 0 for all x € supp T does not imply 
T(¢) = 0, in contrast to the situation for measures. Take, for example, T = D6 
with |a| > 1 and a test function @ with (0) = 0 and @%(0) 4 0. 

Recall Proposition 4.1 where we showed that the simple ordinary differential 
equation D’T = 0 has also in D’(R) only the classical solutions. For ordinary 
differential equations whose coefficients are not constant the situation can be very 
different. We look at the simplest case, the equation x"*! - T =O onR. In L} (R), 


loc 
we only have the trivial solution, but not in D’(R) as the following proposition shows. 


Proposition 4.6 T € D’(R) solves the equation 
x1 .7r=0 in DR (4.8) 


if, and only if, T is of the form 
r= > e's (4.9) 
i=0 


with certain constants C;. 


Proof If T is of the form (4.9) then, for all ¢ € D(R), we have (x"*!.7T,¢) = 
(T°) = Fo GD" 6) = ck = 1G" 6) = 0. since 
(x"+!)(O) = 0 for all i < n. hence T solves Eq. (4.8). 

Now assume conversely that T is a solution of Eq. (4.8). In a first step, we show 
indirectly that T has a support contained in {0}. Suppose x9 € supp T and xp 4 0. 
Then there is a neighborhood U of x9 which does not contain the point x = 0 and there 


56 4 Calculus for Distributions 


is a test function yw € D(U) such that T(yw) # 0. It follows that @ = x-"t Py € 
D(U). Since x”t!. T = 0 we get 0 = (x"*!. T)) = Tx"*!o) = TW), a 
contradiction. Therefore, supp T C {0}. 

Now choose some test function p, € D(R) with p,(x) = 1 for all x € (— s,s) for 
some s > 0, as constructed in the Exercises. Then, for any ¢ € D(R), we know that 
w = (1 — ¢,)¢ has its support in R\ {0} and hence 0 = T(w) = T(@) — T(p,¢@). 
Using Taylor’s Theorem one can write 


(2) = J) 9) +2"! 61) 
i=0 ° 


with 
dix) = [Ss us Ga" go adt EC), 


This allows us to approximate the test function @ near x = 0 by a polynomial, and 
the resulting approximation in D(R) is 


n O) : 
= x! ppx"*! + by 
i=0 


with 2 = p,¢, € D(R). Thus 


n (i) n i 
T(¢) =T(prb) = >> a x! pr) + T(x"*!2) = D> 7 Ped 1/5), 
i=0 i=0 
since T(x"t!@2) = (x"*t!T)(d2) = 0. And we conclude that (4.9) holds with c; = 
SH*T (x! py). 
‘There is a multidimensional version of this result which will be addressed in 
the Exercises. Though its proof relies on the same principle it is technically more 
involved. 


Proposition 4.7 A distribution T € D’(R") has its support in the point xo € R" if, 
and only if, T is of the form (3.15) for some m € N and some coefficients cq € K, 


1é., 
P= tg Da 


la|<m 


Proof The proof that any distribution T € D’(R”) which has its support in a point 
Xo € R" is necessarily of the form (3.15) is given here explicitly only for the case 
n = | and xp = O. The general case is left as an exercise. 

Thus, we assume that T € D’(R) has its support in {0}. And we will show that 
then T solves the equation x”"*! . T = 0 for some m ¢€ N, and we conclude by 
Proposition 4.6. 

As in the proof of this proposition we choose some test function p € D(R) with 
p(x) = 1 for all x € (— s,s) for some 0 < s < 1 and support in K = [— 1, 1], as 


4.4 Some Applications 57 


constructed in the Exercises, and define for 0 < r < 1 the function p,(x) = Gas 
This function belongs to D(R), has its support in [—r, r] and is equal to 1 in(—rs, rs). 
Then, for any w € D(R) the function ¢ = (1 — p,)w belongs to D(R\ {0} ) and thus 
T(@) = 0 since supp T C {0}, or by linearity of T, T(W) = T(p,W). 

Since T is continuous on Dx(R) there are a constant C € R* andm € N such 
that |7(@)| < Cpxm() for all @ € Dx(R). Apply this estimate to y = x™tlog 
for all 6 € Dx (R) to get 


T(x" *" 0.) < Cpe m(x”*" 9,9). 


In the proof that multiplication by C*°-functions is continuous on D({2), we have 
shown the estimate pxm(ud) < cpxKm(WU)PKm() for all u € C°(S2) and all d € 
Dx(Q) with some constant c € R* depending only on m and the dimension n. We 
apply this here for u = x""t!p, and get 


|Z (x"*" p-6)| < CrK m(x"* |p) pem(d) Vb Dr(R). 


The first factor we estimate as follows, using Leibniz’ formula and the identities 
DP xm+! — CED! ym+I-B and DY p,(x) = r-Y (DY p)(2): 


PKn(x"*' p,) 


a! (mt)! pay g x 
< WPasm™P cK) ps yae Bly] aE +1 Pir YIDY’ 0 (-) | 

a! (mt 1)! pig, (X\mti-B Xx 
< SUPasmUPrcK) ps rae Bil! BL +1 | (<) (DY p) (<) | 
< rCPKm(p). 


Now collect all estimates to get, for each @ € Dx (R) and all0 <r < I, 


[T(x 'b)| = |x"! ,$)| < Crpk m(0)P Km): 


Taking the limit r > 0, it follows that T(x”+!¢) = 0 for every @ € Dx(R) and 
hence 
x™!.T=0 in DR) 


and we conclude. 


4.4.2 Renormalization of (4) = ox) 


+ 


As an application of Proposition 4.7 we discuss a problem which plays a fundamental 
role in relativistic quantum field theory. Renormalization is about giving formal inte- 
grals which do not exist in the Lebesgue sense a mathematically consistent meaning. 
Here, the perspective given by distribution theory is very helpful. 


58 4 Calculus for Distributions 


Denote by @ as usual Heaviside’s function. As we have seen earlier, in the con- 
text of introducing Cauchy’s principal value, the function (4)4 = 46(x) is not 
locally integrable on R and thus (4), cannot be used directly to define a regular 
distribution. Consider the subspace Do(R) = {¢ € D(R): &(0) = 0} of the test 
function space D(R). Every @ € Do(R) has the representation (x) = xw(x) with 
W(x) = i o'(tx)dt € D(R). Thus, we get a definition of (4)4 as a continuous 
linear function Dp(R) — K which agrees on (0, 00), i.e., on the test function space 
D(R\ {0} ), with the function 16x, by the formula 


1 [oe] 
(om ed ae 
X XxX 


(4.10) 
If K is any compact subset of R we get, for all 6 € Dg(R) MN Do(R), the estimate 
(|K| denotes the measure of the set K) 


1 
(+P) SK pK.1@) 


which shows that Eq. (4.10) defines (4)4 = Tp as a continuous linear functional 
of order 1 on Do(R). By the Hahn—Banach Theorem (see, for instance, [3, 2]) the 
functional 7) has many continuous linear extensions T to all of D(R), of the same 
order | as Jy. This means the following: T is a continuous linear functional D(R) > 
K such that |7(@)| < |K|px.(@) for all @ € Dx(R) and T|Do(R) = To. Such 
extensions T of 7p are called renormalizations of (4)4. How many renormalizations 
of (4), do we get? This can be decided with the help of Proposition 4.7. Since 
T|Do(R) = Tp we find that T can differ from Tp only by a distribution with support 
in {0}, and since we know the orders of T and Tp it follows from Proposition 4.7 that 


T= Ty + cod + 18’ 


with some constants c; € K. 

In physics, a special 1-parameter family of renormalizations is considered. This 
choice is motivated by the physical context in which the renormalization problem 
occurs. For any 0 < M < oo define for all ¢ € D(R), 


1 M — (0 00 
((—)4,m, ¢) =i coe ear+ f OP) a 
x 0 x — 


It follows easily that (+), is a distribution on R and (+),,y|Do(R) = (4);. Thus, 
(4)4, m 1S a renormalization of (4)4. If (2) is another renormalization of this 
family, a straightforward calculation shows that 


1 1 24 M 5 
(nm = (mw =—m (77) : 


Therefore, (4am, 0 < M < o, is a 1-parameter family of renormalizations of 
(4)4. Now compare (4)4, mw With any other renormalization T of (4)4. Since both 


4.5 Exercises 59 


renormalizations are equal to Ty on Do(R), and since we know T — Ty = cod +. €1 6’, 
we get 0 = co@(0) + c1'(0) for all 6 € Do(R). But d(0) = O for functions in 
Do(R), hence c; = 0. We conclude: Any renormalization of ( +), differs from the 
renormalization (4)4, m only by cod. Thus in this renormalization procedure only one 
free constant appears. 

Similar to the term In (4) above, in the renormalization theory of relativistic 
quantum field theory free constants occur (as renormalized mass or renormalized 
charge for instance ). In this way our simple example reflects the basic ideas of 
the renormalization theory of relativistic quantum field theory as developed by N. 
Bogoliubov, O. S. Parasiuk, K. Hepp and later H. Epstein and V. Glaser [4-8]. 


4.5 Exercises 


1. For f € C*(R") show that 
(Ipe ¢,¢) =(— V!"'(Iy, D796) VGH E DR"). 


2. Prove the following equation in the sense of distributions on R: 
d 1 
— log |x| = vp -. 
dx x 


Hints: Since log |x| € L},.(R) one has, for any ¢ € D(R), 


loc 
[ log (|x|) #(x)dx = lim / __ log ls) 6a. 


Recall in addition: lim,_,9 ¢ loge = 0. 
3. Using the relation log (x + iy) = log |x +iy|+iarg (x + iy) for x, y € R prove 
that the following equation holds in the sense of distributions on R: 


d 1 

qe +10) = vp e im d(x). 

4. Show: A test function @ € D(R) is the derivative of some other test function 
w € D(R), ¢ = W’ if, and only if, 7(¢) = be d(x)dx = 0. 

5. In calculus, we certainly have the identity c = D”P,, with P,(x) = ce + 
P,-\(x) for any polynomial P,_; of degree < n — 1. Show that this identity also 
holds in D’(R), i-e., show the identity J. = D"Ip,. 

6. Let u € C*(Q) be a classical solution of the constant coefficient partial differ- 
ential equation (4.3). Prove: P(D)I, = I, in D'(), hence u solves this partial 
differential equation in the sense of distributions. 

7. Foru € C~(Q), @ € D(2), K C 2 compact, and m € N, show that 


PK m(M,(@)) < CPK mU)PK im) 


for some constant C which depends only on m and n. 


60 


10. 


11. 


12. 


13. 


4 Calculus for Distributions 


Hints: For all x € K and |a| < m, one can estimate as follows: 


a! 
3 eq DhuxyDY O10] < sy Biyt SUP DP utx)| sup 1D Hx) 
B+y=a B+y=a 


. Show: If u € O,,(R”") and @ € S(R”), then M,(¢) = u- @ € S(R") and 


M,, : SCR”) — S(R") is linear and continuous. 
Leto : 2, > Q, be a bijective C°-transformation from a nonempty open set 
2, C R" onto a (nonempty open) set §2,. Show: 
a) @+ ¢oa7! isa continuous linear mapping D(Q,) > D(Qy). 
b) | det | € C°(Q,). 
Given any ¢ € D(@) and a € R" prove that 


lim Pra = g 


t>0 t 


=-a-Do in D(2). 
Hints: Show first: 


(a- Do)(x) + D(x) — Dd(x — sta)|ds 


P(x — ta) — (x) _ ie 

eee ee a-[ 
t 0 

and then estimate the relevant semi-norms for t > 0. 

Given a closed interval [a,b] C R and ¢ > 0, construct a function @ € D(R) 

such that supp ¢ C [a — ¢,b + €] and d(x) = 1 forall x € (a+e,b— 6). (We 

assume € < b — a.) 

Hints: Normalize the function p in (2.14) such that f p(x)dx = | and define, 

forO < r < 1, p,-(x) = 1 (2). Then define a function u, on R by u,(x) = 

re pr(x — y)dy. Show: u, € D(R), suppu, C [— 1 —r,1 +r], andu,(x) = 1 

for allx € (—1+r,1-—r). Finally, translation and rescaling produces a function 

with the required properties. 

Given a closed ball B,(x)) = {x € R" : |x — xo| < r} andO < e <r, construct 

a function @ € DCR") such that ¢(x) = | for all x € R” with |x — x9| < r—-—e 

and supp © K;-+.(xo). 

Hints: The strategy of the one-dimensional case applies. 

Prove: For every u € O,,(IR") (see Proposition 4.3) the multiplication by u, 

o + u- @d, is acontinuous linear map from S(R") into S(R”). 


References 


. Bogolubov NN, Logunov AA, Oksak AI, Todorov IT. General principles of quantum field 


theory. Vol. 10 of mathematical physics and applied mathematics. Dordrecht: Kluwer Academic 
Publishers; 1990. 

Epstein H, Glaser V. The réle of locality in perturbation theory. Ann Inst Henri Poincaré A. 
1973;19:211. 

Hepp K. Théorie de la renormalisation. Vol. 2 of lecture notes in physics. Berlin: Springer-Verlag; 
1969. 


References 61 


4. Hormander L. The analysis of linear partial differential operators. 1. Distribution theory and 
Fourier analysis. Berlin: Springer-Verlag; 1983. 

5. Hoérmander L. The analysis of linear partial differential operators. 2. Differential operators of 
constant coefficients. Berlin: Springer-Verlag; 1983. 

6. Reed M, Simon B. Functional analysis. Vol. 1 of methods of modern mathematical physics. 2nd 
ed. New York: Academic Press; 1980. 

7. Rudin W. Functional analysis. New York: McGraw Hill; 1973. 

8. Zemanian AH. Distribution theory and transform analysis. An introduction to generalized 
functions with applications. Dover books on mathematics. New York: McGraw-Hill; 1987. 


Chapter 5 
Distributions as Derivatives 
of Functions 


The general form of a distribution on a nonempty open set can be determined in 
a relatively simple way as soon as the topological dual of a certain function space 
is known. As we are going to learn in the second part, the dual of a Hilbert space 
is easily determined. Thus, we use the freedom to define the topology on the test 
function space through various equivalent systems of norms so that we can use the 
simple duality theory for Hilbert spaces. 

This chapter gives the general form of a distribution. Among other things the 
results of this chapter show that the space of distribution D’({2) on a nonempty open 
set 2 C R” is the smallest extension of the space C({2) of continuous functions on 
§2 in which one can differentiate without restrictions in the order of differentiation, 
naturally in the weak or distributional sense. Thus, we begin with a discussion 
of weak differentiation and mention a few examples. Section 5.2 provides a result 
which gives the general form of a distribution on a nonempty open set £2 C R". How 
measures and distributions are related and in which way they differ is explained in 
Sect. 5.3. Section 5.4 presents tempered distributions and those which have a compact 
support as weak derivatives of functions. 


5.1 Weak Derivatives 


In general, a locally integrable function f on a nonempty open set 2 C R"” cannot 
be differentiated. But we have learned how to interpret such functions as (regular) 
distributions J, and we have learned to differentiate distributions. Thus, in this way 
we know how to differentiate locally integrable functions. 


Definition 5.1 The weak or distributional derivative D® f of order a € N” of a 
function f € L},.(@) is a distribution on @ defined by the equation 


(D* f,¢) =(-D"" / fO)D*o@)dx Vp e D(). (5.1) 


© Springer International Publishing Switzerland 2015 63 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_5 


64 5 Distributions as Derivatives of Functions 


From the section on the derivatives of distributions we recall that on the subspace 
C'“\(2) of L! () the weak and the classical derivative agree. 


loc 
For m = 0,1,2,... introduce the space of all weak derivatives of order |a| < m 


of all functions in LD); Tes; 
Dregm(S2) = {D°f : f € Lige(), |oe| < m} (5.2) 
and then 
CO 
Dregco(@) = U Diegm(@). (5.3) 


m=0 


/ 


Certainly, the space D,,,,,.(S2) is a subspace of space D'(Q) of all distributions 
on §2. In the following section, we show that both spaces are almost equal, more 
precisely, we are going to show that locally every distribution T is a weak derivative 
of functions in L 1 a(S2). And this statement is still true if we replace Ed AQ) by 
the much smaller space C({2) of continuous functions on 2. The term “locally” in 
this statement means that the restriction Ty of T to Dx (2) is a weak derivative of 
continuous functions. 

Now let us look at some concrete examples. Suppose we are given m € N anda 
set { fy : |a| < m} of continuous functions on 2. Define a function T : D(Q2) > K 


by 
To) = >> / falx)D°p(x)dx VP € D(Q). 


|a|<m 


Elementary properties of Riemann integrals ensure that T is linear. On each 
subspace Dx(S2) one has the estimate |T(¢)| < Cpxm(@) with the constant 
C= in te | fa(x)|dx. Thus, T is a distribution of constant local order m and 
therefore the order of T is finite and equal to m. 

Next, we consider a class of concrete examples of distributions for which the 
local order is not constant. To this end recall the representation of D({2) as the strict 
inductive limit of the sequence of complete metrizable spaces Dx, (2), i € N: 


D(2) = |) DPx,(2). (5.4) 


i=1 


Here K; is a strictly increasing sequence of compact sets which exhaust {2. Take a 
strictly increasing sequence of integers m; and choose functions fy € L},,.(S2) with 


the following specifications: supp fy C Ko for |a| < mo and supp fy C Ki41\Kji 
for m; < |a| < mj41,i = 1,2,.... Then define linear functions T; : Dx,(Q) > K 


5.2 Structure Theorem for Distributions 65 


by 
Tb)= SY) | falx)D*G(x)dx VG € Dx, (2). 
la|<m; 
As above one sees that 7; is continuous on Dx,({2) with the bound |7;(¢)| < 
Ci PK;.m;(@) where C; = Diem te | fo(x)|dx. For all @ € Dx, (2), we find 


Ti) = D> | falx)D*o(a)dx = Ti(9), 


|a|<mj 


since 


> fulx)D*p(x)dx = 0, 


mi <la|<mi41 


because of the support properties of the functions f,. Hence, we get a well-defined 
continuous linear function T : D(Q) — K by defining T|Dx,(2) = T; for all i. 
This distribution T is not of finite order on 2. 


5.2. Structure Theorem for Distributions 


Again it is convenient to start with the representation of the test function space D({2) 
as the strict inductive limit of the sequence of complete metrizable spaces Dx, (2) 
for a strictly increasing and exhaustive sequence of compact K;. Then we can say 
that T is a distribution on £2 if, and only if, 7; = T|Dx,(@) € Di (2) for alli € N. 
This leads to the first step in analyzing the structure of distributions. 


Proposition 5.1 Let 2 C R" be a nonempty open set. Represent the test functions 
space D(S2) as the strict inductive limit of the complete metrizable spaces Dx,(&) 
for a strictly increasing and exhaustive sequence of compact sets K; (see Eq. (5.4)). 
Then the following characterization of distribution holds. 


1. A distribution T € D'(&2) determines, in a unique way, a sequence of functionals 
T, € Dk, (82) which satisfies the compatibility condition 


TID (2)=T, VWieN. (5.5) 


2. Conversely, any sequence of functionals T; € Dk (2) which satisfies the com- 
patibility condition (5.5) determines in a unique way a distribution T on Q by 
defining 


T|Dx,(&2) = T; ViEN. (5.6) 
Proof Since we know Dx;(§2) € Dx;,,(S2), the proof of the first part is obvious. 


For the proof of the second part note that the compatibility condition (5.5) ensures 
that a linear function T : D(2) — K is well defined by Eq. (5.6). Continuity of T 


66 5 Distributions as Derivatives of Functions 


follows from the definition of the inductive topology on D({2) and the continuity of 
the 7;. 

According to this result, the general form of a distribution is known as soon as 
we know the general form of continuous linear functionals on the spaces Dx (£2). 
This can be achieved in a fairly easy way on the basis of a fundamental result from 
the theory of Hilbert spaces which determines the general form of continuous linear 
functionals on a Hilbert space. According to the Riesz—Fréchet Theorem of Part II 
(Theorem 16.3), every continuous linear functional on a Hilbert space H is of the 
form hb (u,h), Wh € H, where (-,-) is the inner product of the Hilbert space and 
the element u € H is determined uniquely by the functional. 

As a second input we use the fact that the topology of the space Dx(§2) can be 
defined in terms of the filtering system of seminorms qx m(@) = ./(@,) xm Where 


C= & is D'Ga)D Wade VG, € De(Q), 


lo |<m 


is a scalar product on Dx(2). (See the Subsect. 2.1.1). The completion of the space 
Dx(S2) with respect to this scalar product produces a Hilbert space Hx.» whose 
scalar product is denoted in the same way. 


Proposition 5.2. Let T be a continuous linear functional on the space Dx(S2), 
QC R" open and nonempty, K C 82 compact. Then there is anm € N and there 
are elements Ug in the Hilbert space L?(K ) of square integrable functions on K such 
that 


Tos >. / Ua(x)D*p(x)dx Vb € Dg (2), 
K 


la|<m 


ie., T is a sum of weak derivatives of square integrable functions on K: 


T= 2 (=I D*h., 


lo|<m 


Proof By definition of the topology on Dx(S2), given T € Di, (£2), there are a 
constant C and there is m € N such that |7| < Cqx.m. Then T has a continuous and 
linear extension Tx to the Hilbert space 1x», which is obtained as the completion 
of Dx(Q2) with respect to the norm gx. AS mentioned above, continuous linear 
functions on the Hilbert space Hx ,, are defined in terms of the scalar product (-, -) Km 
and some element u € Hx». Therefore, we have Tx (v) = (u,v) xm forally € Hxm. 
Taking the specific form of the scalar product into account, we thus get for all 
gb € Dx(Q) C Hxm, since Tx is an extension of T 


TO) = Te(B) = (uP) = | DRUID“ (x)de. 


lo|<m 


Introducing the functions uy = Du the formula for T follows. 


5.3. Radon Measures 67 


Propositions 5.1 and 5.2 together determine the general form of distributions. In 
terms of the results in Proposition 5.2 the compatibility condition of Proposition 5.1 
could be evaluated more explicitly but we omit this since it is not used later. 

Consider, for a moment, the case n = 1. If we integrate u, € L?(K) we get 
a continuous function vg(x) = ia Ug(y)dy where a € K is arbitrary such that 
Dvy = Uy. Thus, in the representation formula for T € D),(@2) in Proposition 5.2 
we can use continuous functions instead of square integrable functions by increasing 
the order of differentiation correspondingly. In particular, this representation is not 
unique. Though formally more involved these statements hold for the general case 
too. 

Collecting the results from above we arrive at the structure theorem for 
distributions. 


Theorem 5.1 Let 22 C R" be a nonempty open set and K; be a strictly increasing 
sequence of compact sets which exhaust 82. T is a distribution on 82, if and only if, 
there is a sequence of nonnegative integers m; and for eachi € N there are elements 
Ui € L?(K;), \a| < m; such that fori = 0,1,2,..., 


T(¢) =T(¢) = >> i; UjalX)D°b(x)dx Vp € Dx(2) (5.7) 


|a|<m; * Ki 


and, for all @ € Dx,(22), 


/ Uista(x)D° pad = )~ / Uia(x)D*p(x)dx. (5.8) 
K Kj 


lal<mjy,° S11 Jo|<m; 


Proof Note that Eq. (5.8) is just the compatibility condition for the sequence of 
functionals T; € Dk, (Q) defined in Eq. (5.7) according to Proposition 5.2. Thus, by 
Proposition 5.1 and Proposition 5.2 we conclude Theorem 5.1. 


5.3. Radon Measures 


As previously, S2 denotes a nonempty open subset of R”. Introduce the space Co({2) 
of all continuous functions f : 2 — R, which have a compact support in 2. For a 
compact subset K of 82 denoted by Cx({2), the subspace of all functions in Co(S2) 
which have a support is contained in K. On the spaces Cx (2), K C S2 compact, we 
use the norms px 9 introduced in Chap. 2. Equip the space Co({2) with the inductive 
limit topology of the spaces (Cx (£2), px.o). A continuous linear functional on this 
space Co({2) is called a real Radon measure on §2. In more concrete terms one has 
the following characterization. 


Corollary 5.1 A linear functional tu : Co(82) > R is a real Radon measure on (2 
if, and only if, for every compact subset K C 82 there is a constant C such that 


IMP < Cprolf) Vf €Cx(S). 


68 5 Distributions as Derivatives of Functions 


Obviously, one has Dx (2) C Cr(2) and D(2) C Co(S2) and the natural 
embeddings are continuous. Hence, every real Radon measure is a distribution. 

Now we discuss some order theoretic properties of the test function space Co({2) 
for Radon measures which the test function space D({2) for distributions does not 
have. Denote by Co,4({2) the set of all nonnegative functions in Co(2). Given f € 
Co(S2), define fi (x) = max {+f(x), O}. It follows that fz € Co4(82) and f = 
ft, — f_. This shows 


Co(2) = Co4 (2) — Co4(2). 


We deduce that every real Radon measure yu is the difference of two positive Radon 
measures 44 and w_: = (L44 — w—. Such decompositions do not hold in distribution 
theory, neither on the level of test functions nor on the level of distributions. In 
general, a continuously differentiable real valued function cannot be written as the 
difference of two nonnegative differentiable functions (take, for instance, the example 
of the sine function). Nevertheless, there is an interesting order theoretic implication 
for distributions. 


Theorem 5.2. Every nonnegative linear form T : D(Q2) > R, i.e, T(d) = 0 for 
all nonnegative @ € D(&2), is the restriction of a positive Radon measure jt to D(S2), 
and thus in particular is continuous. 


Proof Suppose that T is a nonnegative linear function D(2) — R. Introduce 
the restrictions Tx of T to Dx(&2). Clearly, Tx is a nonnegative linear function 
on Dx(S2) and the net Tx, K C §2 compact, satisfies the compatibility condition 
Tx,|Dx, = Tx, for all compact sets Kj C Kx C 2. 

Given a compact set K C £2, there are a compact set K’ C 9 such that K © kK’ 
and a nonnegative function y € Dx/({2) which is equal to 1 on K. Therefore, the 
estimate 


—W(x) PK oP) < O@) S Provo) 


holds for all x € 92 and all @ € Dx(2). Since Tx’ is nonnegative, it preserves this 
estimate: 


—Tr'(W)pxo@) < Tr(@) < Tr Ww) pro) 
and thus, since Tx(¢) = Tx:(@) for all @ € Dx(Q2), 


IT) = Tx) S Tr) Pro) 


for all 6 € Dx(S2). This shows continuity of Tx for every compact subset K C £2. 
Hence, T is a distribution on 92, of order 0. 

This continuity estimate for Tx allows us to extend Tx to a nonnegative linear 
function 4x : Cx(S2) > R with the same bound. This extension process preserves 
the compatibility condition of the net Tx, K C §2 compact. Thus, we can define a 
continuous linear function jz : Co(82) > R by setting w|Cx(@) = wx for K C £2) 
compact. Since each jx is nonnegative, jz is a nonnegative Radon measure on &2, 
and by construction one has .|D(2) = T. 

Further details of the proof are given in [1, 2]. 


5.4 The Case of Tempered and Compactly Supported Distributions 69 


5.4 The Case of Tempered and Compactly 
Supported Distributions 


The results on the structure of distributions show that locally every distribution is a 
weak derivative of functions. In the case of tempered distributions and those distri- 
butions which have a compact support this result holds globally, as we are going to 
prove. 

For the case of distributions with compact support this is fairly obvious. We have 
learned that a distribution T on a nonempty open set {2 with compact support is a 
continuous linear functional on the test function space €(£2), i.e., there is a compact 
subset K C @ and there are a constant C € R* and m € N such that |T(@¢)| < 
Cqx.m(@) for all @ € E(). Here, we have used again the fact that the two filtering 
systems of seminorms {pxym:m=0,1,2,...} and {gxm:m=0,1,2,...} are 
equivalent. Now we can proceed as in Proposition 5.2 and conclude. Note however, 
that the distribution and the functions representing this distribution through a process 
of taking weak derivatives need not have the same support. As an example consider 
Dirac’s delta function 5 which has its support in the point x = 0. And we have 
learned that 6 can be represented as the weak derivative 6’ of the Heaviside function 
6 which has its support in RT. 

By definition, a tempered distribution T on §2 C R” is a linear functional T : 
S(2) — K for which there are constants C € R* and m,k € N such that 


IT@)| < Cpme(o) VP € S(L2). 


Again, the filtering system of norms { Pmx«? m,k =0,1,2,.. } is equivalent to the 
filtering system {4m :m,k =0,1,2,.. } of norms gnx(?) = (6,0) m« defined 
by the scalar product 


(O,W) mk = > | D°b(x)D% W(x) + x7)"dx Vey eS(2). (5.9) 
jajxk 2 


Thus, we can assume |T| < Cam, for some constant C and some nonnegative 
integers m and k. This allows us to proceed as in the proof of Proposition 5.2. Thus, 
there is an element u in the Hilbert space 1/,, , defined as the completion of S(2) 
with respect to the norm q,,, such that 


T(#) = (t.4)ma =~ 1 Deu) D*b(x(l+x7"dx Vp € S(Q). 


la|<k 


Introduce the function uy(x) = (1 + x7)” D@u(x) on 2. Since u € Hm, we know, 
for all |a| < k, that 


i, \Ue(x)(1 + x?) 2 dx (5.10) 
Q 


is finite and thus we formulate the structure theorem for tempered distributions. 


70 5 Distributions as Derivatives of Functions 


Theorem 5.3. Let 2 C R" be an open nonempty set. T is a tempered distribution 
on 82 if, and only if, there are nonnegative integers m,k and there are measurable 
functions Uy on 82, \a| < k, for which the integrals (5.10) are finite such that 


T(¢)= >> i; Ug(x)D°b(x)dx Woe S(2), (5.11) 
2 


|a|<k 


P= > 1p. 


la|<k 


Proof Inthe Exercises we show that Eq. (5.11) indeed defines a tempered distribution 
on §2. That conversely every tempered distribution is of this form we have shown 
above. Thus we conclude. 

Note that this theorem says that tempered distributions are globally of finite order. 


Corollary 5.2 Every tempered distribution T on R" is a finite order derivative of a 
continuous polynomially bounded function t, i.e., there is some multi-index y € N" 


such that for all @ € S(R") 


T() = i t(x)D” b(x)dx (5.12) 


and one has suppT = supp t. 
Proof According to Theorem 5.3, every T € S’(R") is of the form 


T= >) 1)" Du, 


|a|<k 


with measurable functions u, which satisfy condition (5.10) where we have omitted 
the embedding mapping /. An inverse of differentiation is integration. So for some 
reference point x° = @e, ...,x°) introduce the partial integration J j on integrable 


en 
functions f 
xj 
Ui Cis enactia F(X, 6 Eye + +s Xn) dE}. 
*j 


Clearly, D1; f = f and I, --- I, f is continuous for integrable f and polynomially 
bounded if f is. Furthermore, using multi-index notation in a natural way we have 
DPIP f = f. Now choose some multi-index y = (y1,...,%n) such that y; > 
a, +1, for j = 1,...,n for all multi-indices a which occur in (5.10) and then multi- 
indices By such that y = a + B,. By choice of y we ensure that for all w one has 
Ba = CU, 1,..., 1) (component-wise). Finally, define 


t(x) = (-1)""! SO (= 1) Fug. 


|a|<k 


References 71 
It follows that ¢ is a polynomially bounded continuous function which satisfies 


(-1)'"'D’ t= > (—1)¢| pethe poe y, = ~~ (—1)! D® DP Toa uy, 


Ja|<k la|<k 


= > (—1)"!D*u, = T. 


la|<k 


The support condition can be shown indirectly using continuity of the function f. 


5.5 Exercises 


1. Prove: The two filtering systems of norms P = t Drak :m,k =0,1,2,.. J and 
O= {4m :m,k =0,1,2,.. J on S(() are equivalent. 

2. Show that Eq. (5.11) defines a tempered distribution. 

3. Find an example of a distribution which is not a tempered distribution. 

Hints: Try regular distributions. 

4. Show: Every continuous polynomially bounded function on R” defines a dis- 
tribution in S’(R") N (i (R"), but not every continuous function on R” which 
defines a distribution in S’(R”) N Dg (R") is polynomially bounded. 

Hints: Try the function f(x) = e* sine* = — s( cose*) on R. 


References 


1. Donoghue WF. Distributions and Fourier transforms. New York: Academic; 1969. 
2. Schwartz L. Théorie des distributions. Vol. 1, 2nd ed. Paris: Hermann; 1957. 


Chapter 6 
Tensor Products 


The tensor product of distributions is a very important tool in the analysis of distri- 
butions. We will use it mainly in the definition of the convolution for distributions 
which in turn has many important applications, some of which we will discuss in later 
chapters (approximation of distributions by smooth functions, analysis of partial dif- 
ferential operators with constant coefficients). The tensor product for distributions is 
naturally based on the tensor product of the underlying test function spaces and their 
completions. Accordingly, we start by developing the theory of tensor products of 
test function spaces to the extent which is needed later. The following section gives 
the definition and the main properties of the tensor product for distributions. We 
assume that the reader is familiar with the definition of the algebraic tensor product 
of general vector spaces. A short reminder is given in Sect. 18.2. 


6.1 Tensor Product for Test Function Spaces 


In the chapter on (elementary aspects of) calculus for distributions we discussed 
among other things a product between functions, between distributions and certain 
classes of functions, and between distributions if the distributions involved satisfied 
certain restrictions. This pointwise product assigns to two functions (or distributions) 
on a set 2 C R" a new function (distribution) on the same set 2. 

On the other side, the tensor product assigns to two functions (distributions) fj 
on (in general) two different open sets (2;, i = 1,2, a new function (distribution) on 
the product set 92; x 2. To be more specific, assume that 2, C R” and 2) C R™ 
are two nonempty open sets and ¢; € D(2;) are two test functions on $2; and £22, 
respectively. The tensor product of ¢, and ¢ is the function ¢; ® @2 : £2) x 22 > K 
defined by 


1 ® G2(X1, x2) = b1(%1)b2(%2) V (x1,%2) € QD x Qo. (6.1) 


Certainly, the tensor product ¢; ® 2 is a C™-function on 2; x {2 which has a 
compact support; thus @; ® ¢2 € D(S2; x $22) for all 6; € D(Q;). The vector space 


© Springer International Publishing Switzerland 2015 73 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_6 


74 6 Tensor Products 


spanned by all these tensor products $; ® ¢2 is denoted by D(S2,)@D(22). A general 
element in D({2;) ® D(&2) is of the form 


N 
Vib @vi. 6 © DQ), Wie DQ), 1=1,2..,N; (62) 
i=1 
and it follows that the algebraic tensor product D(2;) ® D(S22) of the test function 
spaces D(§2)) and D({22) is contained in the test function space over the product set 
92; X {23: 
D(21) ® D(22) C D(Q, x 22). 


As a subspace of the test function space D(§2; x §22) the tensor product space carries 
naturally the relative topology of D(§2, x S22). A first important observation is that 
this tensor product space is dense in the test function space D({2; x (22). 


Proposition 6.1 Suppose that 2; C R",i = 1,2, are nonempty open sets. Then the 
tensor product space D(§2;)®D(&22) of the test function spaces D(§2;) is sequentially 
dense in the test function space D(QQ, x 822) over the product set $2, X 822, i.e., 


D(21) @ D2) = D(A x 22), (6.3) 
where t indicates that the closure is taken with respect to the topology of the space 
D(821 x S22). 


Proof We have to show that any given yw € D(S2; x §22) is the limit of a sequence of 
elements in D({2;) ® D(22), in the sense of uniform convergence for all derivatives 
on every compact subset K C $2; x S22. This is done in several steps. 

Given w € D(§2; x (22) we introduce in a first step the sequence of auxiliary 
functions yy defined by 


Hee i: ele — EWE )dE = i, _exlB(e — 88. 


Here the following notation is used: n = n; + no, Z = (%1, X2) € $2) X $22 and 


ex(Z) = (4) 
kKIZ) = Vin ‘ 


Observe that e, € C°(R”) and i ex(z)dz = | for all k € N. Without giving the 
details of the straightforward proof we state, for all a € N”, for the derivatives, 


D'yta) = [ele (D* Wie - #8 


for all k € N. Since all derivatives D“w of yw are uniformly continuous on R”, 
given ¢ > 0 there is ad > O such that |(D%W)(z) — (D* w)(z — &)| < « for all 
z € R" and all € € R” with |€| < 6. The normalization of e; allows us to write 
(D* )(z) — (D%r)x(z) as the integral f ex(&)[(D* y)(z) — (D*)(z — &)|d& which 


can be estimated, in absolute value, by 


6.1 Tensor Product for Test Function Spaces 75 


i e@ID"WEO-DVne—H1ae+ [ ex(§)|(D* W)(x)—(D* w)Z—&)|d&. 
lé|<d |§|>6 

By choice of 6, using the notation ||D°y||oo = SUp,epn |(D* y)(z)|, this estimate 
can be continued by 


< ef —ex(E)dE + 21D Was 


ex(§ dé. 
lel>sI 


The first integral is obviously bounded by | while for the second integral we find 


/ ex(E)dé = 17? / e° dz<e 
|E|>6 |z|>ké 


for all k > ko for some sufficiently large ky € N. Therefore, uniformly in z € R", 
for all k > ko, 
|(D°y)(z) — (D° Wi] S 2e. 


We deduce: Every derivative Dw of y is the uniform limit of the sequence of 
corresponding derivatives D® wy of the sequence yx. 

In a second step, by using special properties of the exponential function, we pre- 
pare the approximation of the elements of the sequence w; by functions in the tensor 
product space D({2;) ® D(22). To this end we use the power series representation 
of the exponential function and introduce, for each k € N, the sequence of functions 
defined by the formula 


n N 


1 ; 
a ft Be- wee. 
i=o VR 


7) 


As in the first step the derivative of these functions can easily be calculated. One 
finds 


k 
Vin (Z) = ( 


N n 
1 k : 
D*% = eh SE _ k2 — ¢)27i D*% d 
taal) = D5 (=) [ (- Pe -srtwwene 
and therefore we estimate, for all z € IR” such that |z—&| < R forall € in the support 
of w for some finite R, as follows: 
|D° Win) — DVIS (zy Dewar WP — €P (DWE lds 
< In(k, R) fign (D* WE )IdE 


where 


bo? > ER) 
intk,®) = (—) eae as N > oo. 
i=N+ 


Using the binomial formula to expand (z — &)?/ and evaluating the resulting 
integrals, we see that the functions Wy are actually polynomials in z of degree 


76 6 Tensor Products 


< 2N, and recalling that z stands for the pair of variables (x;, x.) € R™ x R”, we 
see that these functions are of the form 


Wen (%1,%2) = ys Con 
|a|,|B|S2N 


Since y € D(2; x 22), there are compact subsets K; C £2; such that suppy C 
K, x Ky. Now choose test functions x; € D(S2;) which are equal to 1 on K;, 
j = 1,2. It follows that (x; @ x2): w = w and 


Pin = (Xi ® X2)° Vw € D821) @ D(Q22) VK, N EN, 


since the yx, are polynomials. 

For any compact set K C R” there is a positive real number R such that |z—&| < R 
for all z € K and all & € K, x K>. From the estimates of the second step, for all 
a € N", we know that D* yy, (z) converges uniformly in z € K to D*y,(z). Using 
Leibniz’ rule we deduce 


jim bk. = (X1 @ X2)° We = be in D(S21 x 23). 
Again using Leibniz’ rule we deduce from the estimates of the first step that 


jim ox = (X18 x2) vw=y in D(82) x 22) 


and thus we conclude. 
On the algebraic tensor product E @ F of two Hausdorff locally convex topo- 
logical vector space E and F over the same field, several interesting locally convex 
topologies can be defined. We discuss here briefly the projective tensor product topol- 
ogy which plays an important role in the definition and study of tensor products for 
distributions. Let P (respectively Q) be the filtering system of seminorms defining 
the topology of the space E (respectively of F’). Recall that the general element x in 

E ® F is of the form 
x =) e@ fi with e; € Eand f;€ F,i=1,...,.manymeN. (64) 


i=1 


Note that this representation of the element x in terms of factors e; € E and f; € F 
is not unique. In the following definition of a semi-norm on EF ® F, this is taken into 
account by taking the infimum over all such representations of x. Now given two 
seminorms p € P and q € Q, the projective tensor product p ®z q of p and q is 
defined by 


P@x g(x) = inf | D> plea fi): x = 54 ® ‘| (6.5) 


i=l i=1 


In the Exercises we show that this formula defines indeed a seminorm on the tensor 
product E ® F. It follows immediately that 


PSxrqe@fi)=pleaqf) Veek, VfeF. (6.6) 


6.2 Tensor Product for Distributions 77 


From the definition it is evident that p®,g < p'®q and p®,q < p@q’ whenever 
D,p' € P satisfy p < p' andq,q’ € Q satisfy gq < q'. Therefore the system 


P @xQ={[pOrq: pEP,geQ} (6.7) 


of seminorms on EF ® F is filtering and thus defines a locally convex topology on 
E ® F, called the projective tensor product topology. The vector space E @ F 
equipped with this topology is denoted by 


EQ, F 


and is called the projective tensor product of the spaces E and F. 

This definition applies in particular to the test function spaces E = D({2)), 2) C 
R", and F = D(22), 2. C R”. Thus we arrive at the projective tensor product 
D(Q1) @z DP(&22) of these test function spaces. The following theorem identifies the 
completion of this space which plays an important role in the definition of tensor 
products for distributions. The general construction of the completion is given in the 
Appendix A. 


Theorem 6.1 Assume 82; © R”) are nonempty open sets. The completion of the 
projective tensor product D(S2)) @z D(&22) of the test function spaces over 92; is 
equal to the test function space D(§2; x S22) over the product 92, x {22 of the sets 
Q; . 


D(21)@xD(@2) = D(21 x Qo). (6.8) 


6.2 Tensor Product for Distributions 


Knowing the tensor product of two functions f,g € Lj,,.(S&), we are going to 


define the tensor product for distributions in such a way that it is compatible with 
the embedding of functions into the space of distributions and the tensor product for 
functions. Traditionally the same symbol @ is used to denote the tensor product for 
distributions and for functions. Thus our compatibility condition means I @ I, = 
Tree for all f,g € Lj,,(@). Since we know how to evaluate Ia, we get 


loc 


(17 Ole, POW) = Ufag.9 @W) 
= foca (F ® Bx. YG ® Wa, y)dxdy 
= fora, fO8OPO)W(y)dxdy = (Ip, $) (Ig, ¥) 
for all @ € D(§2;) and all Ww € D(S22). Hence the compatibility with the embedding 
is assured as soon as the tensor product for distributions is required to satisfy the 


following identity, for all T € D’(2)), all S € D’(22), all 6 € D(Q)), and all 
we D(2)), 


(T @S,6 @ W) = (T,¢)(S,). (6.9) 


78 6 Tensor Products 


Since the tensor product is to be defined in such a way that it is a continuous linear 
functional on the test function space over the product set, this identity determines the 
tensor product of two distributions immediately on the tensor product D(2)@D(22) 
of the test function spaces by linearity: 


N N 
TOS x)= DATING H) Vx=YiH@ We ASM). 6.10) 


i=1 


Thus we know the tensor product on the dense subspace D(2;) ® D(22) of D(Qy x 
$22) and this identity allows us to read off the natural continuity requirement for 
T ® S. Suppose K; C 2; are compact subsets. Then there are constants C; € Rt 
and integers m; such that |(7,)| < Ci px,m,(@) forall@ € Dx, (82)) and |(S, w)| < 
C2PK>,m,(W) for all yy € Dx, (S22) and thus, using the abbreviations pj = PK; ,m;; 


N 


I(T @ S, x)| = Dkk (7, di)IN(S, vi)| < ee 


i=l i=] 


for all representations of x = ey ¢; ® Wj, and it follows that 


N N 
I(T @ S,x)| < CiCrinf | D> pigipa(vi): x = > 4: @® wf. 


i=l i=l 


KT ® S,x)| < CiCa(p1 @x p2)(X)- (6.11) 


Hence the tensor product T ® S of the distributions T on §2; and S on {22 is a con- 
tinuous linear function D(2;) ®z D(22) — K which can be extended by continuity 
to the completion of this space. In Theorem 6.1 this completion has been identified 
as D(22, x $23). 

We prepare our further study of the tensor product for distributions by some 
technical results. These results are also used for the study of the convolution for 
distributions in the next chapter. 


Lemma 6.1 Suppose 92; © R” are nonempty open sets and @ : Q, x 2, > Kis 
a function with the following properties: 


(a) For every y € 82 define $y(x) = (x, y) for all x € 92). Then dy € D(Q)) for 
all y € S29. 

(b) For alla € N"' the function D?$(x, y) is continuous on 22, x Qo. 

(c) For every yo € S29 there is a neighborhood V of yo in S23 and a compact set 
K C 9; such that for all y € V the functions $y have their support in K. 


Then, for every distribution T € D'(92)) on Q), the function y > f(y) = (T,¢y) 
is continuous on $2). 


6.2 Tensor Product for Distributions 719 


Proof Suppose yo € {22 andr > 0 are given. Choose a neighborhood V of yo 
and a compact set K C {2; according to hypothesis (c). Since T is a distribution on 
§2; there are a constant C and an integer m such that |(7,@)| < Cpx.m(@) for all 
@ € Dx(221). By hypothesis (b) the derivatives D¥ (x, y) are continuous on K x £22. 
It follows (see the Exercises) that there is a neighborhood W of yo in $22 such that 
for all y € W, 


r 
PKm (dy ~ Py) = Cc 

Since for all y € V 1 W the functions ¢, belong to Dx (S2)) we get the estimate 

If) — FOO = MT, by) — (7, by) | = MT, by — Pyo)| S CPKm(by — Oyo) Sr. 


Therefore f is continuous at yo and since yo was arbitrary in §22, continuity of f on 
§2, follows. 


Corollary 6.1 Under the hypotheses of Lemma 6.1 with hypothesis (b) replaced by 
the assumption @ € C°(Q, x 922), the function y +> f(y) = (T, dy) is of class C® 
on $22 for every distribution T € D'(Q), and one has 


DE (Ty) = (1, DE by). 


Proof Differentiation is known to be a local operation in the sense that it preserves 
support properties. Thus we have 


1. Dfby € D(Q)) for all y € 2; 

2. D* DE b(x, y) is continuous on (2; x $22 for alla € N” and all B e N”; 

3. For every 8 € N” and every yo € S22 there are a neighborhood V of yo in 922 
and a compact set K C 2; such that suppD*¢, C K forally eV. 


By Lemma 6.1 it follows that, for each T € D’(Q,) and each 6 € N", the functions 
yr (T, Ds gy) are continuous on $2). In order to conclude we have to show that 
the functions (7, Ds gy) are just the derivatives of order 6 of the function (7, ¢,). 
This is quite a tedious step. We present this step explicitly for |B] = 1. 

Take any yo € £22 and choose a neighborhood V of yo and the compact set 
K C 8 according to the third property above. Take any T € D’(2,). For this 
compact set K and this distribution there are a constant C and an integer m such that 
KT, v)| < Cpx.m(¥) for all y € Dx (92). The neighborhood V contains an open 
ball yo + B,(O) around yo, for some r > 0. Abbreviate 0; = = and calculate for 
h € B,(0), as an identity for C°-functions of compact support in K C Qi, 


Pyyth— Py = i by 4nd = 
= O21 (Pyohti + O24 fy Wi:b)yo41n — (8:h) yp Jide. 
Applying the distribution T to this identity gives 


(T, Pyyth _ Py) = beer (T, (0;P) yy) hi 
+ ZT, fo [Gib)sotth — (i P)yo]) Aidt. 


80 6 Tensor Products 


For all |a| < m andi = 1,2,...,n2 the functions D¥(0;@)(x, y) are continuous on 
92; x $22 and have a compact support in the compact set K for all y € V. Thus, as 
in the proof of Lemma 6.1, given ¢ > 0 there is 6 > O such that for all i = 1,...,n2 
and all |y — yo| < 6 one has px m((d;)y — (0;%)yy) < a and we can assume 5 <r. 
It follows that 


1 1 
‘ies ( i, (Bsn GP Het) = , Pr n(l(2B)yo41h — OPg At S = 


and thus 


1 1 
ur, f [(0;) yon — (0:) yo ldt)| < Cpxim (/ [O:@)yy4th — (8:6) kt <eé 


We deduce that 
(T, Pyoth _ Py) = = T, (0; @)yo) )hi TF o(h). 


Therefore the function f(y) = (T,,) is differentiable at the point yo and the 
derivative is given by 


I(T, by) = (T, (0;@)y), i=l,...,m2. 


The functions 0; satisfy the hypotheses of Lemma 6.1, hence the functions y b> 
(T,(0;@)y) are continuous and thus the function f(y) = (T,¢,) has continuous 
first-order derivatives. 

Since with a function ¢ all the functions (x, y) bh DE p(x, y), B € N”, satisfy 
the hypothesis of the corollary, the above arguments can be iterated and thus we 
conclude. 

The hypotheses of the above corollary are satisfied in particular for test functions 
on £2; x S22. This case will be used for establishing an important property of the 
tensor product for distributions. 


Theorem 6.2. Suppose that Q; C R", i = 1,2, are nonempty open sets. 
(a) For ¢ € D(Qy x 22) and T € D'(Q)) define a function w on Q, by 


W(y) = (T, dy). 


Then w is a test function on 22: Ww € D(822). 
(b) Given compact subsets K; C 2; and an integer m2, there is an integer m, 
depending on K, and the distribution T such that 


PKom(W) S Pry ym (T)P Kix Komi tm () (6.12) 


for all 6 € Dx, xK,(&@1 X Q2). 
(c) The assignment (T,@) +» w defined in part (a) defines a bilinear map F : 
D'(Q1) x D(Q, x 22) > D(22) by F(T, 6) = v. 


6.2 Tensor Product for Distributions 81 


(d) The map F : D'(2\) x D(2y x 22) > D(22) has the following continuity 
property: F is continuous in @ € D(Q, x 822), uniformly in T € B, B a weakly 
bounded subset of D'(Q)). 


Proof Itis straightforward to check that a test function @ € D({2), §22) satisfies the 
hypotheses of Corollary 6.1. Hence this corollary implies that yw € C°(S22). There 
are compact subsets K; C $2; such that supp@ C K, x Ky». Thus the functions ¢, 
are the zero function on {2 for all y € §22\K>2 and therefore supp y C K>. This 
proves the first part. 

For @ € Dx,xK,(§21 X $22) one knows that all the functions (Db o)y, y € Ka, 
B € N” belong to Dx,(Q\). Since T € D’(92;), there is an m; € N such that 
Pim (T) is finite and 


[(T,(DEb)y)| < Phe m (LP Kim (DE $)y) 


for all y € K> and all 6. By Corollary 6.1 we know that 


D' wy) = (T,(D§@)y), 


therefore 


ID? WO)! S Phe (TP Kim (DE b)y) = Pim (T) sup |DE DE d(x, y) 


x€K,,|a|<m, 


and we conclude that 


PKom(W) < Pim (T) pr, x Koymitm(). 


Thus the second part follows. 

Since F(T, @) = (T,¢@.), F is certainly linear in T € D’(Q,). It is easy to see 
that for every fixed y € (2, the map ¢@ +> @y isa linear map D({2; x 822) > D(Q2)). 
Hence F is linear in ¢ too and part (c) is proven. 

For part (d) observe that by the uniform boundedness principle a (weakly) bounded 
set B C D’(Q;) is equicontinuous on Dx,(2;) for every compact subset K; C 22). 
This means that we can find some m, € N such that 


sup Dk, m,(T) < 00 
TeB 


and thus by estimate (6.12) we conclude. 


Theorem 6.3 (Tensor Product for Distributions) Suppose that Q; C R",i = 1,2, 
are nonempty open Sets. 


(a) Given T, € D'(Q;) there is exactly one distribution T € D'(Q x 22) on Q, x Qy 
such that 


(T,o1 ® b2) = (TN,.91)(Th,¢2) = Vb; € D(Q;), i = 1,2. 


T is called the tensor product of T; and Tr, denoted by T, ® Tp. 


82 6 Tensor Products 


(b) The tensor product satisfies Fubini’s Theorem (for distributions), i.e., for every 
T; € D'(Q;), i = 1,2, and for every x € D(Q, x 22) one has 


(T, @ Th, x) = (1 ® Tr), y), x, y)) 
= (Tix), (Thy), X@, y))) = (Tay), (Ti), X,Y). 


(c) Given compact subsets K; C 82; there are integers m; € N such that Px; m; (Ti) 
are finite for i = 1,2 and for all x € Dx,xK,(S21 X S22), 


(Ty ® To, x) < Pym LV) Pq my (22) PK x Kam +m2(X)- (6.13) 


Proof Given T; € D’(Q;) and x € D(Q, x 2) we 6.2 that F(T;, x) € D(Q2). 
Thus 


(T, x) = (Th, F(N1, x)) (6.14) 


is well defined for all x € D(Q, x 22). Since F is linear in x, linearity of T, implies 
linearity of 7. In order to show that T is a distribution on §2; x §2, it suffices to 
show that T is continuous on Dx, x x,(S21 X $22) for arbitrary compact sets Kj C $2;. 
For any x € Dx,xK,(S21 X $22) we know by Theorem 6.2 that F(T), x) € Dx,(22). 
Since 7, € D’(22) there is mz € N such that Pym (72) is finite, and we have the 
estimate 


KT, x)| = (Ta, PCL XO) S Phy my(T2) Pam. (F(T, x). 


Similarly, since T; € D’(S2)), there is an m, € N such that pi, ,,,(71) is finite so 
that the estimate (6.12) applies. Combining these two estimates yields 


(T, x)| < Pix, mT) PK my 12) PK x Ko,mi-+mp(X) 


for all x € Dx,xx,(81 x $22) with integers m; depending on 7; and K;. Thus 
continuity of T follows. 

For x = ¢; ® do, ¢; € D(Q;), we have F(T, x) = (7,, 1) @2 and therefore the 
distribution T factorizes as claimed: 


(T, 1 ® 62) = (M1, 61) (To, $2). 


By linearity this property determines T uniquely on the tensor product space D({2;)® 
D(&22) which is known to be dense in D(§2 x S22) by Proposition 6.1. Now continuity 
of T on D(Q, x 22) implies that T is uniquely determined by 7; and 7,. This proves 
part (a). 

Above we defined T = 7; ® T> by the formula (T, x) = (Tn(y), (Ti(x), x, y))) 
for all x € D({2Q) x $22). With minor changes in the argument one can show that 
there is a distribution S on (2; x §22, well defined by the formula 


(S, x) = (T(x), (Try), x(x, y))) 


for all x € D(2) x 22). Clearly, on the dense subspace D(S2;) ® D(S22) the 
continuous functionals S and T agree. Hence they agree on D({2; x £22) and this 
proves Fubini’s theorem for distributions. 


6.2 Tensor Product for Distributions 83 


The estimate given in part (c) has been shown in the proof of continuity of T = 
T,; ® T. 

The following corollary collects some basic properties of the tensor product for 
distributions. 


Corollary 6.2 Suppose that T; are distributions on nonempty open sets 2; CR". 
Then the following holds: 


(a) supp (T; ® Tz) = supp 7; ® supp 7. 
(b) D&(T, ® Tr) = (DYT)) ® Th. Here x refers to the variable of T. 


Proof. The straightforward proof is done as an exercise. 


Proposition 6.2 The tensor product for distributions is jointly continuous in both 
factors, i.e., if T = limjsoo T; in D'(Qi) and S = limjs.oo Sj in D'(22), then 


T@S= lim T; ® S; in D'(2; x Q2). 
jroo 
Proof Recall that we consider spaces of distributions equipped with the weak topol- 
ogy o (compare Theorem 3.3). Thus, for every x € D(Q, x £22), we have to show 


that 
(T @®S,x) = lim (7; @ S;, x). 
jroo 


By Proposition 6.1 and its proof we know: Given x € Dx(S2; x §22) there are 
compact sets K; C {2;, K C K, x Kz, such that x is the limit in Dx, x «(1 x 222) 
of a sequence in Dx, (921) ® Dx, (22). 
Since T = limj_,.o T;, Eq. (3.12) of Theorem 3.3 implies that there is an m,; ¢ N 
such that 
Pry m,(Tj) S Mi VjeEN 


and similarly there is an m2 € N such that 
P's, (Sj) < M2 Vj EN. 


These bounds also apply to the limits 7, respectively S. 
Now, given ¢ > 0, there is a x, € Dx,(§21) ® Dx,(S22) such that 


é 
PK xKzmj+mz(X — Xe) < 4M M) 


By part (c) of Theorem 6.3 this implies the following estimate: 
\(T; ® Si MX _ Xe)I < MM? pK, x Kom +m(X 7 Xe) < é/4 Vi EN. 


And the same bound results for T @ S. 
Finally we put all information together and get, for all j € N, 


(T @S —T; © Six) < \(T @ S—T; ® SiX — Xe) + (TF @ S— Tj @ S))(Xe)| 


84 6 Tensor Products 
< 2M, Mopr, ® Koymi-+m2(X _ Xe) + \(T ® S— Tj ® S))(Xe)| 
<e/2+|(T ®S —T; ® S;)(xe)I. 


On D(2;) ® D(&22) the sequence (T; ® S;)jen certainly converges to T @ S (see 
Exercises). Hence there is jo € N such that |(T @ S — T; ® S;)(Xe)| < €/2 for all 
J = jo. It follows that 


(T@S-T,@S)Ql<e Viz jo. 


This concludes the proof. 


6.3. Exercises 


1. Prove: Formula 6.5 for the projective tensor product of two seminorms p,q on E 
respectively on F defines indeed a seminorm on the tensor product E ® F. 

2. Prove Theorem 6.1! 

Hint: Consult the book [1]. 

. Complete the proof of Lemma 6.1. 

4. Prove the following: Assume that a sequence (;)j<«n converges in D(S2) to 
¢@ € D(2) and the sequence of distributions (T;) jen C D’(S2) converges weakly 
to T € D'(&2). Then the sequence of numbers (7j(¢;))jen converges to the 
number T(q), i.e., 


iS’) 


baie Tj(¢j) = T(@). 


Hint: In the Appendix C.1 it is shown that a weakly bounded set in D’({2) is 
equicontinuous. 

5. Prove Corollary 6.2. 

6. Assume T = limj—oo Tj in D’(&2)) and S = limj—oo S; in D’(22). Prove: For 
every x € D(&2) ® D(22) one has lim j,.. T; ® Sj(x) = T ® S(x). 


Reference 


1. Tréves F. Topological vector spaces, distributions and kernels. New York: Academic; 1967. 


Chapter 7 
Convolution Products 


Our goal is to introduce and to study the convolution product for distributions. In order 
to explain the difficulties that will arise there, we discuss first the convolution product 
for functions. Also for functions the convolution product is only defined under certain 
restrictions. Thus we start with the class Cp(R”) of continuous functions on R” which 
has a compact support. 


7.1 Convolution of Functions 


Suppose u,v € Co(R"); then for each x € R” we know that y u(x — y)v(y) is 
a continuous function of compact support and therefore the integral of this function 
over IR” is well defined. This integral then defines the convolution product u * v of u 
and v at the point x: 


ux v(x) = i u(x — y)v(y)dy Vx ER’. (7.1) 


The following proposition presents elementary properties of the convolution product 
on Co(R"). 


Proposition 7.1 The convolution (i.e., the convolution product) is a well-defined 
map Co(IR") x Co(R") > Co(R"). For u,v € Co(R”) one has 


i) UXV=VXU, 
ii) supp (u* v) C suppu + supp v. 


Proof We saw above that u * v is a well-defined function on R”. Note that 
u(x — y)v(y)=0 whenever y ¢ suppv or x — y ¢suppu. It follows that the inte- 
gral Hf u(x — y)v(y)dy vanishes whenever x € IR” cannot be represented as the 
sum of a point in suppu and a point in suppv. This implies that supp (u * v) C 
supp u + suppv = suppu + suppv, since supp u and supp v are compact sets (see 
the Exercises). This proves part ii). 


© Springer International Publishing Switzerland 2015 85 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_7 


86 7 Convolution Products 


Since (x, y) u(x — y)v(y) is a uniformly continuous function on R” x R”, the 
integration over a compact set gives a continuous function (see the Exercises). Thus, 
u* v is a continuous function of compact support. 

The change of variables y > x — z gives 


/ ie swenbr= / ulzvlx — 2)dz = (v # U(x) 
R" R" 


and proves part i). 
Corollary 7.1 [fu € Co'(R") and v € Co(R"), then ux*v € Co'(R") and D* (ux v) = 


(D*u)*v forall |a| < m; similarly, ifu € Co(R") andy € Co’ (R") thenuxv € Co’ (R") 
and D*(u* v) = ux* (D*v) for all |a| < m. 


Proof For |a| < m the function (x, y)  (D%u)(x — y)v(y) is uniformly continuous 
on R” x R"; integration with respect to y is over the compact set supp v and thus 
gives the continuous function (Du) * v. Now the repeated application of the rules 
of differentiation of integrals with respect to parameters implies the first part. From 
the commutativity of the convolution product the second part is obvious. 

Naturally, the convolution of two functions u and v is defined whenever the integral 
in Eq. (7.1) exists. Obviously this is the case not only for continuous functions 
of compact support but for a much larger class. The following proposition looks 
at a number of cases for which the convolution product has convenient continuity 
properties in the factors and which are useful in practical problems. 


Proposition 7.2 Let u,v : R" — K be two measurable functions. Denote | f |, = 
ESS SUP, cpn | f(x)| and |f |; = dies | f (x)|dx. Then the following holds. 


a) Ifu € L®(R") andv € L'(R"), thenuxv € L®(R") and ||u * V|loo < [ulloollv lh. 


b) Ifu€ L'(R")andv € L®(R"), thenuxv € L®(R") and ||u * V|loo < lullillvlloo- 
c) Ifu,v € L'(R"), thenu*v € L'(R") and |\u* v||, < |lulli |v. 


Proof Consider the first case. One has | fp, u(x—y)v(y)dy| < fen lu(x—y)] |v(y)Idy 
< ||ul|o0||v|]1 and part a) follows. Similarly one proves b). For the third part we have 
to use Fubini’s theorem: 


[ue vil = |(u* v)@x)\dx < i (/ |u(x — y)| Mm dIdy) a 
Re Re" Rr 
</ ( l(a — yo ydId) ay 
R" Re 


=f / lu(z)| lv) Idedy = [lulls lvl: 
R" R 


Another important case where the convolution product of functions is well defined 
and has useful properties is the case of strongly decreasing functions. The following 
proposition collects the main results. 


7.1. Convolution of Functions 87 


Proposition 7.3 


1. Ifu,v € S(R"), then the convolution u x v is a well-defined element in S(R"). 
2. Equipped with the convolution * as a product, the space S(R") of strongly 
decreasing test functions is a commutative algebra. 


Proof Recall the basic characterization of strongly decreasing test functions: u € 
C™(IR”) belongs to S(R") if, and only if, for all m,/ € N the norms p,, (uw) are finite. 
It follows that for every w € N” and every m € N one has 


Pm,lja|(U) 


(1+ x2)2 


Thus, for u,v € SCR”), for all m,k € N and all a € N”, the estimate 


|D*u(x)| < Vx eR’. 


Pm,ja|(u) Pxo(v) 
re (1+ (x — y)*)? (14 x2)3 


[ “(Dlx = ywO)Idy < 


is available, uniformly in x € R”. If we choose k > n + 1, the integral on the right 
hand side is finite; therefore in this case the convolution (D%u) * v exists. As earlier 
one shows D%(u * v) = (D@u) * v, and we deduce u * v € C™(R"). 

In order to control the decay properties of the convolution u « v observe that 


1 2 
ase ae Vx,y ER’. 
LG = yy] 
For k =n + 1+ m we thus get 
2\8 1) na o dy 
(1 + x°)? |D° (Wu V(X)! < Pm,jol(U)Pk.oV)2? | ———z- 
Re (1+y?)F 


The integral in this estimate has a finite value C. This holds for any m € N and any 
a € N”. We conclude that u * v € S(R") and 


Pm (u* V) <2? Cpms(U)Pn+1+4m,olv)- (7.2) 


This estimate also shows that the convolution is continuous on S(R"). As earlier the 
commutativity of the convolution is shown: u * v = v * u for all u, v € SCR”). This 
proves the second part and thus the proposition. 

The main application of the convolution is in the approximation of functions by 
smooth (i.e., C°-) functions and in the approximation of distributions by smooth 
functions. The basic technical preparation is provided by the following proposition. 


Proposition 7.4 Suppose that (j)ien is a sequence of continuous function on R" 
with support in the closed ball Br(O) = {x € R": ||x|| < R}. Assume furthermore 


i) 0 < ¢;(x) for all x € R" and alli € N; 
ii) hes o;(x)dx = 1 for alli € N; 
iii) For every r > 0 one has limj-+ Sao di(x)dx = 0. 


88 7 Convolution Products 


For u € C(R") define an approximating sequence by the convolution 
uj = ux OF ieN. 
Then the following statements hold: 


a) The sequence u; converges to the given function u, uniformly on every compact 
subset K Cc R"; 

b) For u € C™(R") and |a| < m the sequence of derivatives D*u; converges, 
uniformly on every compact set K, to the corresponding derivative Du of the 
given function u; 

c) If in addition to the above assumptions the functions ; are of class C, then the 
approximating functions u; = u * ¢; are of class C® and statements a) and b) 


hold. 


Proof In order to prove part a) we have to show: given a compact set K C R” and 
€ > O there is an ig € N (depending on K and €) such that for all i > io, 


lu — us ll Koo = sup |u(x) — uj(x)| < «. 
xeEK 


With K also the set H = K + Br(0) is compact in R”. Therefore, as a continuous 
function on R”, wu is bounded on H, by M let us say. Since continuous functions are 
uniformly continuous on compact sets, given e > 0 there is a 5 > O such that for 
all x,x’ € H one has |u(x) — u(x’)| < 5 whenever |x — x’| < 5. The normalization 
condition ii) for the functions ¢; allows us to write 


u(x) — u(x) = iu [u(x) — u(x — y)]Pi(y)dy 
R"” 
and thus, for all x € K, we can estimate as follows: 


lu(x) — u(x) < / ee) = He = lO Oay 


Bs(0) 


+ / iy Sa anions 
Bs(0)¢ 

< / Sdiy)dy +2M f— gity)dy 
Bs(0) B3(0)° 


E 
<=+2M gi(y)dy. 
2 Bs (0) 


According to hypothesis iii) there is an ig € N such that Bs(0)° di(y)dy < Fz for all 
i > ip. Thus we can continue the above estimate by 


€ é 
|u(x) — uj(x)| < 5 + 2M 7 Vx eK, Vi > io. 


This implies statement a). 


7.2 Regularization of Distributions 89 


If u € C’”(R") and |a| < m, then D®u € C(R") and by part a) we know (D%u) * 
¢; — Du, uniformly on compact sets. Corollary 7.1 implies that D®u; = D%(u * 
;) = (D%u) « ¢;. Hence part b) follows. 

This corollary also implies that u; = u * ¢; € C°(R”) whenever ¢; € C*(R"). 
Thus we can argue as in the previous two cases. 

Naturally the question arises how to get sequences of functions ¢; with the prop- 
erties 1)—iii) used in the above proposition. Recall the section on test function spaces. 
There, in the Exercises we defined a nonnegative function p € D(R”) by Eq. (2.14). 
Denote a = fp, o(x)dx and define 


bi(x) = EE Ge Vx ER" VIEN. (7.3) 
a 


Given ¢ > 0, choose ip > = Then for all i > ip one has 


1 tines 1 
/ gi (x)dx = -| i" p(ix)dx = -{ p(y)dy = 0, 
B.(0)¢ a J\x|>e a J\y|Zie 


since suppp © {y € R”: |y| < 1}. Now it is clear that this sequence satisfies the 
hypotheses of Proposition 7.4. 


Corollary 7.2 Suppose 2 C R" is a nonempty open set and K C &2 is compact. 
Given ©, 0 < € < dist(092, K), denote K, = {x € Q: dist(K,x) < e}. Then, for 
any continuous function f on 82 with support in K there is a sequence (uj)ien in 
Dx, (92) such that 

jim Prolf — uj) = 9. 


Tf the function f is nonnegative, then also all the elements u; of the approximating 
sequence can be chosen to be nonnegative. 


Proof See the Exercises. 


7.2 Regularization of Distributions 


This section explains how to approximate distributions by smooth functions. This 
approximation is understood in the sense of the weak topology on the space of 
distributions and is based on the convolution of distributions with test functions. 

Given a test function ¢ € D(R") and a point x € R", the function y % ¢,(y) = 
(x — y) is again a test function and thus every distribution on R” can be applied to 
it. Therefore one can define, for any T € D’(R"), 


(T * (x) = (T, bx) = (TV), @@—y)) Vx eR". (7.4) 


This function T « @ : R” — K is called the regularization of the distribution T by 
the test function ¢, since we will learn soon that T * @ is actually a smooth function. 


90 7 Convolution Products 


This definition of a convolution product between a distribution and a test function 
is compatible with the embedding of functions into the space of distributions. To see 
this take any f € L},.(IR”) and use the above definition to get 


loc 
(Lp * (x) = Lp, bx) = a fO)bxO)dy = (f Px) VxeR’, 


where naturally f * @ is the convolution product of functions as discussed earlier. 
Basic properties of the regularization are collected in the following theorem. 


Theorem 7.1 (Regularization) For any T € D’(R") and any ¢,w € D(R") one 
has: 


a) T x@ € C™(R") and, for alla € N", 
D*(T *¢)=T * D°¢6=D°T x ¢; 


b) supp (T * @) € supp T + supp ¢; 
c) (T,¢) =(T *@)(0) where (x)= ¢(-—x) Vx eR’; 
d) (T*b)*v=T*(p*y). 


Proof For any test function @ € D(R") we know that x(x, y) = ¢(x — y) belongs 
to C°(R” x R”). Given any x9 € R” take a compact neighborhood V,, of xo in 
R”. Then K = V,, — supp¢@ C R” is compact, and for all x € V,, we know that 
supp xx = {x} — supp@ C K, xx(y) = x(x, y). It follows that all hypotheses of 
Corollary 6.1 are satisfied and hence this corollary implies T * @ € C°(R") and 
D°(T * 6) = T « D%¢. 

Now observe D¢ (x — y) = (— 1)'*'D¢ (x — y), hence, for all x € R", 


(T * D*$)(x) = (T(y), (D%b\(x — y)) = (TO), (— DI" DFG@ — y)) 
= (D°T(y), (« — y)) = (D°T * $)). 


This proves part a). 

In order that (T *« @)(x) does not vanish, the sets {x} — supp T and supp @ must 
have a nonempty intersection, i.e., x € supp T + supp @. Since supp @ is compact 
and supp T is closed, the vector sum supp T + supp ¢ is closed. It follows that 


supp (T * ¢) = {x € R": (T * b)(x) £ O} C suppT + supp ¢ = supp T + supp ¢, 


and this proves part b). 
The proof of part c) is a simple calculation. 


(T * $)(0) = (T(x), 60 — y)) = (Ty), 60) = (7. ). 


Proposition 7.1 and Corollary 7.1 together show that @ * w € D(R") for all 
¢,w € D(R"). Hence, by part a) we know that T « (@ « yw) is a well-defined 


7.2 Regularization of Distributions 91 


C®-function on R”. For every x € R” it is given by 


(70). ff 66 = y — 22K) 


As we know that T « ¢ is a C-function, the convolution product (T * @) « w has the 
representation, for all x € R’, 


[ (T * bx — zw (@)dz = i: (T(y), O(& — y — z))W@)dz. 

Hence the proof of part d) is completed by showing that the action of the distribution 
T with respect to the variable y and integration over R” with respect to the variable 
z can be exchanged. This is done in the Exercises. 

Note that in part b) the inclusion can be proper. A simple example is the constant 
distribution T = J, and test functions @ € D(R) with te d(x)dx = 0. Then we have 
(T * @)(x) = ie o(x)dx = 0 for all x € R, thus supp T « ¢ = @ while supp J; = R. 

As preparation for the main result of this section, namely the approximation 
of distributions by smooth functions, we introduce the concept of a regularizing 
sequence. 


Definition 7.1 A sequence of smooth functions ¢; on R” is called a regularizing 
sequence if, and only if, it has the following properties. 


a) There isa gd € D(R"), d # O, such that @j(x) = j"o(jx) for all x € R’, 
a) eee 

b) 6; € D(R") for all 7 < N; 

c) O< ¢;(x) for all x € R” andall j ¢ N; 

d) fipn Pj(x)dx = 1 forall j EN. 


Certainly, if we choose a test function @ € D(R") which is nonnegative and which is 
normalized by ten (x)dx = | and introduce the elements of the sequence as in part 
a), then we get a regularizing sequence. Note furthermore that every regularizing 
sequence converges to Dirac’s delta distribution 6 since regularizing sequences are 
special delta sequences, as discussed earlier. 


Theorem 7.2 (Approximation of Distributions) For any T € D’(R") and any 
regularizing sequence (;) jen, the limit in D'(R") of the sequence of C*-functions 
T; on R", defined by T; = T « ; forall j = 1,2,...,is T, ie, 


T = lim 7; = lim T «¢; in D’(R’). 


jroo jrow 


Proof According to Theorem 7.1 we know that T « @; € C(R"). If @ € D(R") is 
the starting element of the regularizing sequence, we also know supp ¢; C supp¢@ 
for all 7 € N. Take any w € D(R”), then K = supp@ — supp y is compact and 
supp (¢; *W) C K forall j € N (see part ii) of Proposition 7.1). Part c) of Proposition 
7.4 implies that the sequence D°(@; * wy) converges uniformly on K to DW, for all 


92 7 Convolution Products 
a € N”, hence the sequence (¢; * W)jen converges to v in Dx (R”). Now use part 


c) of Theorem 7.1 to conclude through the following chain of identities using the 
continuity of T on Dx (R"): 


tim, [ r 6jyooveorde = fim CT « 4) + TO) 


jim 7 * GAO) 


jim (7,9 * w)) =(T,Y). 


Remark 7.1 


a) The convolution gives a bi-linear mapping D’(IR”) x D(R") 
— C~(R") defined by (7,6) T x @. 

b) Theorem 7.1 shows that C°(IR") is dense in D’(R"). In the Exercises we show that 
also D(IR”) is dense in D’(R”). We mention without proof that for any nonempty 
open set 2 C R" the test function space D({2) is dense in the space D’(2) of 
distributions on £2. 

c) The results of this section show that, and how, every distribution is the limit 
of a sequence of C®-functions. This observation can be used to derive another 
characterization of distributions. In this characterization a distribution is defined 
as a certain equivalence class of Cauchy sequences of C°-functions. Here a 
sequence of C®-functions f; is said to be a Cauchy sequence if, and only if, 
J f;(x)b(x)dx is a Cauchy sequence of numbers, for every test function ¢. And 
two such sequences are called equivalent if, and only if, the difference sequence 
is a null sequence. 

d) We mention a simple but useful observation. The convolution product is transla- 
tion invariant in both factors, i.e., for T € D’(R"), @ € D(R"), and every a € R” 
one has 


(T * b)a = Ty * 6 = T * da. 


For the definition of the translation of functions and distributions compare 
Eq. (4.6). 


We conclude this section with an important result about the connection between 
differentiation in the sense of distributions and in the classical sense. The key of the 
proof is to use regularization. 


Lemma 7.1 Suppose u, f € C(IR") satisfy the equation Dju = f in the sense of 
distributions. Then this identity holds in the classical sense too. 


Proof Suppose two continuous functions u, f are related by f = Dju = ou in the 


sense of distributions. This means that for every test function @ the identity 


- i iD OO y = | FO)6()dy 


7.3, Convolution of Distributions 93 
holds. Next choose a regularizing sequence. Assume ye€D(R") satisfies 


[vo dy =1. Define, for e>0, w.(x) = Eu (With ¢= 4, iéN, we have 
a regularizing sequence as above). Now approximate u and f by smooth functions: 


Us =U* We, fe = f*We- 


ue and f, are C~-functions, and as e — 0, they converge to u, respectively f, 
uniformly on compact sets (see Proposition 7.4). A small calculation shows that 


Djuc(x) = [orotic — y)dy = — / U(y) Dy, We(x — y)dy, 
and taking the identity D;u = f in D’(R”) into account we find 
Djue(x) = / SO) We(x — y)dy = fe(x). 


Denote the standard unit vector in R” in coordinate direction j by e; and calculate, 
forhe R,h £0, 


1 1 1 
jue + hej) — u,(x)] = i (Djue)(x + the;)dt = i fe(x + the;)dt. 


Take the limit ¢ — 0 of this equation. Since uv, and f, converge uniformly on compact 
sets to u, respectively f, we get in the limit for all |h| < 1,h 40, 


1 
qlucx + hey) — wont = f f(x + the ;)dt. 
0 


It follows that we can take the limit h — 0 of this equation and thus u has a partial 
derivative D u(x) at the point x in the classical sense, which is given by f(x). Since 
x was arbitrary we conclude. 


7.3 Convolution of Distributions 


As we learned earlier, the convolution product u * v is not defined for arbitrary pairs 
of functions (u,v). Some integrability conditions have to be satisfied. Often these 
integrability conditions are realized by support properties of the functions. Since the 
convolution product for distributions is to be defined in such a way that it is compatible 
with the embedding of functions, we will be able to define the convolution product 
for distributions under the assumption that the distributions satisfy a certain support 
condition which will be developed below. 

In order to motivate this support condition we calculate, for f € Co(R”) and 
g € C(R”), the convolution product f * g that is known to be a continuous function 


94 7 Convolution Products 


and thus can be considered as a distribution. For every test function ¢ the following 
chain of identities holds: 


Pai [ (fx g)eopeode = [ rc [ _ Fle y)gody)oodx 


= / ul f(x — y)g(y)b()dy dx = / f(@giy)o + y)dy dz 
Rt JR" R'xR" 


= (T¢ @ 1,)(z, y), PZ + y)) 


where we used Fubini’s theorem for functions and the definition of the tensor product 
of regular distributions. Thus, in order to ensure compatibility with the embedding of 
functions, one has to define the convolution product for distributions T, S € D’(R") 
according to the formula 


(T * S,¢) = (T @ S)\x, y), 6a +y)) Vee DR") (7.5) 


whenever the right hand side makes sense. Given ¢ € D(R"), the function y = Wg 
defined on R” x R" by W(x, y) = o(« + y), is certainly a function of class C® but 
never has a compact support in R” x R” if @ # 0. Thus in general the right hand 
side of Eq. (7.5) is not defined. There is an obvious and natural way to ensure the 
proper definition of the righthand side. Suppose supp (T @ S)M supp Wg is compact 
in R” x R” for all ¢ € D(R"). Then one would expect that this definition will work. 
The main result of this section will confirm this. In order that this condition holds, 
the supports of the distributions T and S have to be in a special relation. 


Definition 7.2 Two distributions T,S € D’(R”) are said to satisfy the support 
condition if, and only if, for every compact set K C R” the set 


Kr,s = {(x, y) € R" x R": x € suppS, y € supp S, xtyeK} 


is compact in R” x R’. 

Note that the set K7,5 is always closed, but it need not be bounded. To get an idea 
about how this support condition can be realized, we consider several examples. 
Given T, S € D’(R") denote F = supp T and G = supp S. 


1. Suppose F C R” is compact. Since K7,s is contained in the compact set F x 
(K — F) itis compact and thus the pair of distributions (T, S') satisfies the support 
condition. 

2. Consider the case n = | and suppose F' = [a, +00) and G = [b, +00) for some 
given numbers a,b € R. Given a compact set K C R it is contained in some 
closed and bounded interval [ — k,+k]. A simple calculation shows that in this 
case Kr.5 C [a,k — b] x [b,k — a], and it follows that Kr,s5 is compact. Hence 
the support condition holds. 

3. For two closed convex cones C;,C2 C R", n > 2, with vertices at the origin 
and two points a; € IR”, consider F = a; + C; and G = az + C2. Suppose 
that the cones have the following property: given any compact set K C R” there 
are compact sets K;, K2 C R” with the property that x; € Cj and xj + x2 € K 


7.3 Convolution of Distributions 95 


implies x; ¢ K; C; for 7 = 1,2. Then the support condition is satisfied. The 
proof is given as an exercise. 
4. This is a special case of the previous example. In the previous example we consider 


the cones C) = C2 =C=jx ER": x,>0 Vi xi for some 0 > 0. Again 
we leave the proof as an exercise that the support condition holds in this case. 


Theorem 7.3 (Definition of Convolution) /f two distributions T, S € D’(IR") sat- 
isfy the support condition, then the convolution product T * S is a distribution on 
R", well defined by the formula (7.5), i.e., by 


(T * S,¢) = (T @ S)(x, y), 6 +y)) Vee DR"). 


Proof Given a compact set K C R", there are two compact sets K,, K2 C R” 
such that Kr7.5 © K, x Ko, since the given distributions T,S satisfy the support 
condition. Now choose a test function y € DCR” x R”) such that w(x, y) = | for 
all (x,y) € K, x Ko. It follows that for all 6 € Dx(R") the function (x, y) & 
(1 — w(x, y))@( + y) has its support in IR” x IR"\K, x K» and thus, because of the 
support condition, 


(T ® S)(x, y), O(% + y)) = (T @ S)\X,y), WO, yb +y)) Vb Ee Dx(R"). 


By Theorem 6.3 we conclude that the right hand side of the above identity is a 
continuous linear functional on Dx(R"). Thus we get a well-defined continuous 
linear functional (T * S)x on Dx(R"). 

Let K;, i € N, bea strictly increasing sequence of compact sets which exhaust 
IR”. The above argument gives a corresponding sequence of functionals (T * S)x,. It 
is straightforward to show that these functionals satisfy the compatibility condition 
(T * S)x,,,/Dx,(R") = (T * S)x,, i € N and therefore this sequence of functionals 
defines a unique distribution on R” (see Proposition 5.1), which is denoted by T * S 
and is called the convolution of T and S. 


Theorem 7.4 (Properties of Convolution) 


1. Suppose that two distributions T, S € D'(R") satisfy the support property. Then 
the convolution has the following properties: 
a) Tx S=S-*T, i.e., the convolution product is commutative; 
b) supp(T * S) C supp T + supp S; 
c) Foralla € N" one has D®(T * S) = D°T * S=T x D®. 
2. The convolution of Dirac’s delta distribution 5 is defined for every T € D'(R") 
and one has 
6*T=T. 


3. Suppose three distributions S,T,U € D’(R") are given whose supports satisfy 
the following condition: For every compact set K C R" the set 


{(x, y,z) € R™ : x € supp S, y € suppT, z € suppU, xty+zeKkK} 


96 7 Convolution Products 


is compact in R*”. Then all the convolutions S*T, (S*T)*U, T *U, S*(T *U) 
are well defined and one has 


(S*T)*U=Sx*(T x*U). 


Proof Note that the pair of distributions (S, 7) satisfies the support condition if, and 
only if, the pair (7, S) does. Thus with T * S also the convolution S* T is well defined 
by the above theorem. The right hand side of the defining formula (7.5) of the tensor 
product is invariant under the exchange of T and S. Therefore commutativity of the 
convolution follows and proves part a) of 1). 

Denote C = supp T + supp S and consider a test function ¢ with support in R”\C. 
Then (x+y) = Oforall (x, y) € supp T x supp S and thus ((T*S)(x, y), é(x+y)) = 
0 and it follows that supp (T * S) C (R"\C)° = C, which proves part b). 

The formula for the derivatives of the convolution follows from the formula for 
the derivatives of tensor products (part b) of Corollary 6.2 and the defining identity 
for the convolution. The details are given in the following chain of identities, for 
o € D(R"): 


(D°(T * S), 6) = (—1)" (T * S, D°9) 
= (-1)" (T @ S)(a, y), (D%b@ + y)) 
= (-1)* (T(x), (SQ), DFO + y))} 
= (T(x), (D*S)(y), O( + y))) 
= ((T * D°S)(x, y), o(« + y)) = (T * D°S, 9). 
Thus D°(T * S) = T * D“S and in the same way D°(T « S) = D°T « S. This proves 
part c) 
Dirac’s delta distribution 5 has the compact support {0}, hence for any distribution 
T on R’ the pair (6, T) satisfies the support condition. Therefore the convolution 6 * T 


is well defined. If we evaluate this product on any ¢ € D(R") we find, using again 
Theorem 6.3, 


(5 * T,) = (6 @ T)(x, y), OC + y)) = (Ty), (8X), 6 + y))) = (Ty), 6) 


and we conclude 6 « T = T. 
The proof of the third part about the threefold convolution product is left as an 
exercise. 


Remark 7.2 


1. As we have seen above, the support condition for two distributions T, S on R” 
is sufficient for the existence of the convolution product T « S. Note that this 
condition is not necessary. This is easily seen on the level of functions. Consider 
two functions f,g € L?(R"). Application of the Cauchy-Schwarz’ inequality 
(Corollary 15.1) implies, for almost all x € R”, | f * g(x)| < || fll2Ilgll. and 


7.3 Convolution of Distributions 97 


hence the convolution product of f and g is well defined as an essentially bounded 
function on L?(R"). 

2. The simple identity D°(T * 5) = (D%5) * T = D°T will later allow us to write 
linear partial differential equations with constant coefficients as a convolution 
identity and through this a fairly simple algebraic formalism will lead to a solution. 

3. If either supp T or supp S is compact, then supp 7 + supp S is closed and in 
part 1.b) of Theorem 7.4 the closure sign can be omitted. However, when neither 
supp T nor supp S is compact, then the sum supp T + supp S is in general not 
closed as the following simple example shows: consider T, § € D’(IR?) with 


supp T = {(x,y) € R*:0 <x, +1 < xy}, 


supp S = {(x, y) €R?:0<x, xy< —1}. 
Then the sum is 

supp T + supp S = {(x,y) € R?:0 <x} 
and thus not closed. 


The regularization T * ¢ of a distribution 7 by a test function ¢ is aC™-function by 
Theorem 7.1 and thus defines a regular distribution 7,4. Certainly, the test function 
¢ defines a regular distribution Jy and so one can ask whether the convolution product 
of this regular distribution with the distribution T exists and what this convolution 
is. The following corollary answers this question and provides important additional 
information. 


Corollary 7.3 Let T € D’(R") be a distribution on R". 


a) For all @ € D(R") the convolution (in the sense of distributions) T * Ig exists 
and one has 
T x Ig => Tyg. 


b) Suppose T has a compact support. Then, for every f € C°(R"), the convolution 
T « If exists and is a C®-function. One has 


T x Tp = Ips f> 
ie. T x f isa C™-function. 


Proof Since the regular distribution Jy has a compact support, the support condition 
is satisfied for the pair (7, Ig) and therefore Theorem 7.3 proves the existence of the 
convolution T + Jy, and for y € D(R") the following chain of identities holds. 


(T * Ip, h) = (T ® Ip), y), WO + y)) = (T(x), (Toy), We + y))) 


= (T(x), / (W(x + y)dy) = (T(x), l $(z — xW(e)dz) 


= [ire. PZ — x) W(z)dz = ike * O(Z)W(z)dz = Urxg, V)- 


98 7 Convolution Products 


The key step in this chain of identities is the proof of the identity 


ro, f 66 — x)W(Z)dz) = fire. pz — x))W(z)dz (7.6) 


and this is given in the Exercises. This proves patt a). 

If the support K of T is compact, we know by Theorem 7.3 that the convolution 
T * If is a well-defined distribution, for every f € C°(R"). In order to show that 
T * If is actually a C*°-function, choose some y € D(R") such that w(x) = 1 for 
all x € K. For all ¢ € D(R”) we have supp (¢ — W@) C K° = R"\K and therefore 
(T,@ — wd) = 0. This shows that T = w - T. Thus, for every ¢ € DCR”) we can 
define a function hg on R” by 


hg(x) = (Ty), @ + y)) = (TO), VOO@ + y)) Vx eR". 


Corollary 6.1 implies that hg is aC°-function with support in supp @ — K. Similarly, 
Corollary 6 shows that the function 


Zr g(z) = (TY), WO) FR — y)) 


is of class C° on R”. Now we calculate, for all @ € D(R"), 
(T * I¢,0) = ((W-T) * Tp,6) = Lp), (w+ TI), O& + Y))) 
= (Te), (Ty), WOOO + y))) = [ito. VO)FE — y))P@)dz. 


Hence T x I is equal to J,. Since obviously g = T « f, part b) follows. 

From the point of view of practical applications of the convolution of distributions, 
it is important to know distinguished sets of distributions such that, for any pair in 
this set, the convolution is well defined. We present here a concrete example of such 
a set which later will play an important role in the symbolic calculus. Introduce the 
set of all distributions on the real line that have their support on the positive half-line: 


D'(R) = {T € D'(R)| suppT C [0, +00)} . 


With regard to convolution this set has quite interesting properties as the following 
theorem shows. 


Theorem 7.5 


a) D',(R), equipped with the convolution as a product, is an Abelian algebra with 
Dirac’s delta distribution 5 as the neutral element. It is however not a field. 
b) (D'_(R), *) has no divisors of zero (Theorem of Titchmarsh). 


Proof It is easily seen that any two elements T,S ¢ D‘,(R) satisfy the support 
condition (compare the second example in the discussion of this condition). Hence 
by Theorem 7.3 the convolution is well defined on D‘(R). By Theorem 7.4 this 
product is Abelian and T * S has its support in [0, +00) for all 7, S € D'(R) and 
the neutral element is 6. . CR) is not a field under the convolution since there are 


7.3 Convolution of Distributions 99 


elements in D’, (IR) that have no inverse with respect to the convolution product 
though they are different from zero. Take for example a test function @ € D(R) 
with support in R*+ = [0,+00), @ 4 0. Then the regular distribution Jy belongs to 
D‘,(R), and there isno T € D‘(R) such that T « Ig = 6, since by Corollary 7.3 one 
has T * Ig = I7,g and by Theorem 7.1 it is known that T « @ € C(R). This proves 
part a). 


or S = 0. The proof is somewhat involved and we refer the reader to [1]. 


Statement b) means: if T, S € D’(R) are given and T « S = 0, then either T = 0 


Remark 7.3 


1. 


The convolution product is not associative. Here is a simple example. Observe 
that 5’ « 6 = D(d * 8) = D@ =4, hence 


1x ('*@)=1*5=1. 
Similarly, 1 * 6’ = D(1 *« 5) = D1 = 0, hence 


(1*6’)*6 =0. 


. For the proof that (D’,(R),*) has no divisors of zero, the support properties 


are essential. In (D’(R), *) we can easily construct counterexamples. Since 6’ € 
D’(R) has a compact support, we know that 6’ « | is a well-defined distribution 
on R. We also know 6’ 4 0 and 1 # 0, but as we have seen above, 6’ « 1 = 0. 


. Fix S € D’(R") and assume that S has a compact support. Then we can consider 


the map D’(R”) + D’(R") given by T +> T x S. It is important to realize that 
this map is not continuous. Take for example the distributions T,, = 6,,n € N, 
ie., T,(¢) = o(n) for all @ € D(R). Then 1 « 7, = 1 for alln € N, but 


lim T,=0 inD/(R). 
n->o 


. Recall the definition of Cauchy’s principal value vp+ in Eq. (3.7). It can be 


used to define a transformation H, called the Hilbert transform, by convolution 
ft H(f)=vpt x f: 


H(f (x) = lim iO} 


£0 Jix—-yiae XY 


dy VxeR. (7.7) 


This transformation is certainly well defined on test functions. It is not difficult 
to show that it is also well defined on all f € C!(R) with the following decay 
property: for every x € R there is a constant C and an exponent w > 0 such that 
for all y € R, |y| > 1, the estimate 


lf —y)-— f@+y)| = Clyl™ 


holds. This Hilbert transform is used in the formulation of “dispersion relations”, 
which play an important role in various branches of physics (see [2]). 


100 7 Convolution Products 


7.4 Exercises 


ea 


The sum A+ B of two subsets A and B of a vector space V is by definition the set 
A+B={x+yeV:.x€A, y € B}. For compact subsets A, B C R” prove 
that the closure of the sum is equal to the sum: 


A+B=A-+B. 


Give an example of two closed sets A, B C R” such that the sum A + B is not 
closed. 

Fill in the details in the proof of Proposition 7.1. 

Prove Corollary 7.2. 

For T € D’(R") and ¢, w € D(R") prove the important identity (7.6) 


(T(x), / $(z — xW()dz) = / (T(x), (2 — x))Wle)dz. 


Hints: One can use, for instance, the representation theorem of distributions as 
weak derivatives of integrable functions (Theorem 5.1). 

Prove: D(R") is (sequentially) dense in D’(R"). 

Prove Part 3. of Theorem 7.4 


References 


1. 


2: 


Gel’ fand IM, Silov GE. Generalized functions I: properties and operations. 5th ed. New York: 
Academic Press; 1977. 

Thirring W. A course in mathematical physics: classical dynamical systems and classical field 
theory. Springer study edition. New York: Springer-Verlag; 1992. 


Chapter 8 
Applications of Convolution 


The four sections of this chapter introduce various applications of the convolution 
product, for functions and distributions. The common core of these sections is a 
convolution equation, i.e., a relation of the form 


TxX=S, 


where 7, S are given distributions and X is a distribution that we want to find in a 
suitable space of distributions. We will learn that various problems in mathematics 
and physics can be written as convolution equations. As simple examples the case of 
ordinary and partial linear differential equations with constant coefficients as well as 
a well-known integral equation is discussed . Of course, in the study of convolution 
equations we encounter the following problems: 


1. Existence of a solution: Given two distributions 7, S, is there a distribution X in 
a suitable space of distributions, such that T * X = S holds? 

2. Uniqueness: If a solution exists, is it the only solution to this equation (in a given 
space of distributions)? 


An ideal situation would be if we could treat the convolution equation in a space of 
distributions, which is an algebra with respect to the convolution product. Then, if T 
is invertible in this convolution algebra, the unique solution to the equation T* X = S$ 
obviously is X = T~! * S. If however T is not invertible in the convolution algebra, 
the equation might not have any solution, or it might have several solutions. 

Unfortunately this ideal case hardly occurs in the study of concrete problems. We 
discuss a few cases. Earlier we saw that the space of all distributions is not an algebra 
with respect to convolution. The space L!(R”) of Lebesgue integrable functions on 
R” is an algebra for the convolution but this space is not suitable for the study 
of differential operators. The space €' of distributions with compact support is an 
algebra for the convolution, but not very useful since there is hardly any differential 
operator that is invertible in €’. The space of distributions with support in a given 
cone can be shown to be an algebra for the convolution. It can be used for the study 
of special partial differential operators with constant coefficients. Thus we are left 
with the convolution algebra Dy (R) studied in Theorem 7.5. 


© Springer International Publishing Switzerland 2015 101 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_8 


102 8 Applications of Convolution 


8.1 Symbolic Calculus—Ordinary Linear 
Differential Equations 


Suppose we are given an ordinary linear differential equation with constant coeffi- 
cients 


N 
eae” =F, (8.1) 
n=0 


where the a, are given real or complex numbers and f is a given continuous function 
on the positive half-line R+. Here, y“’ = D" y denotes the derivative of order n of 
the function y with respect to the variable x, D = 4. 

By developing a symbolic calculus with the help of the convolution algebra 
(D),(R), *) of Theorem 7.5, we will learn how to reduce the problem of finding 
solutions of Eq. (8.1) to a purely algebraic problem that is known to have solutions. 
The starting point is to consider Eq. (8.1) as an equation in D‘, (R) and to write it as 
a convolution equation 


N 
(> oi) xy=f. (8.2) 
n=0 


The rules for derivatives of convolution products and the fact that Dirac’s delta 
distribution is the unit of the convolution algebra (D, (R), *) imply that 


y™ = Sey = 8 x y, 


and thus by distributivity of the convolution product, Eqs. (8.1), (8.2), and (8.3) are 
equivalent. In this way we assign to the differential operator 


N 


P(D)= aD", 


n=0 
the element 
N 
P(S)=) 04,5"  € DR) 
n=0 
with support in {0} such that 
P(D)y = P(S) *y (8.3) 


on D',(R). Thus we can solve Eq. (8.1) by showing that the element P(6) has an 
inverse in (D‘, (IR), *). We prepare the proof of this claim by a simple lemma. 


8.1 Symbolic Calculus—Ordinary Linear Differential Equations 103 


Lemma 8.1 The distribution 5’ — 26 € D',(R), X €C, has a unique inverse given 
by the regular distribution 


e,(x) => Tocxyer & D',(R). 


Proof The proof consists of a sequence of straightforward calculations using the 
rules established earlier for the convolution. We have (6 — Ad) * e, = 6’ *e, —Ad * 
e—A=6' *e, — Ae, and 


5 xe, = D(S x e,) = De, = eS tre, = 5 + hey 


where we have used the differentiation rules of distributions and the fact that D0 = 6. 
It follows that 
(6' —Ad)* e, = 5 


and therefore e, is an inverse of 5’ — 46 in D!,(R). Since D', (R) has no divisors of 
zero, the inverse is unique. 


Proposition 8.1 Let P(x) = ag + a,x +--+ + a,x" be a polynomial of degree n 
(a, 4 0) with complex coefficients a;. Denote by {a1 peer rp} the set of zeros (roots) 
of P with multiplicities {ki aes kp}, Lé., 
P(x) = an(x — Ai ++ — Ap). 
Then 
P(5) = a5 + a8 4 +++ + gS = ay (5! — 215) + # (8! — ApS) 
has an inverse in D'_ (IR), which is given by 


1 
P(s)7! = er ofan 8 oe e =E. (8.4) 


nN 


Proof Fora, uw € C calculate (5’—15)*(6’— wd) = 5'*5! —A5 «5! — 4’ x US +AUS*S = 
5’ x 5’ — (A + )d’ + Awd where we used the distributive law for the convolution 
and the fact that 5 is the unit in (D! (R), *). Using the differentiation rules we find 
5! * 6’ = D(85') = D(6’) = 5™. It follows that 


(5’ — 08) * (8 — 46) = 8 — (A+ wd! + du. 


In particular for A = jz, one has (6 — Ad)** = 8° — 2A6/ + A? where for S € D’_(R) 
we use the notation S** = § x --- * § (k factors). Repeated application of this 
argument implies (see exercises) 


a5 + a5") +++ + apd = ay (5! — AS) (8 — 15). 


Knowing this factorization of P(4), it is easy to show that the given element E € 
D',(R) is indeed the inverse of P(5). Using the above lemma and the fact that 
(D),(R), *) is an Abelian algebra, we find 


1 
(se sek ci) we P(5) = ef oe (8 — 248)" ee es (8! — Ap) 


n 


104 8 Applications of Convolution 


= [e,, * (5 — A d)] e+ ++ * [e, * (8! — ApS)? = SH xe SMP = 8, 


Hence, the element FE € D',(R) is the inverse of P(6). 

After these preparations it is fairly easy to solve ordinary differential equations of 
the form (8.1), even for all f € Di, (R). To this end rewrite Eq. (8.1) using relation 
(8.2), as 


PO)*y =f, 
and thus by Proposition 8.1, a solution is 
y= PO)! «f, (8.5) 
in particular, for f = 6, E = P(6)~' is a special solution of the equation 
P(D)T =6 in D',(R). (8.6) 


This special solution E is called a fundamental solution of the differential operator 
P(D). The above argument shows: Whenever we have a fundamental solution FE of 
the differential operator P(D), a solution y of the equation P(D)y = f for general 
inhomogeneous term f € D’, (IR) is given by 


y=Exf. 


Since the fundamental solution is expressed as a convolution product of explicitly 
known functions, one can easily derive some regularity properties of solutions. 


Theorem 8.1 Let P(D) = y a, D" be an ordinary constant coefficient differ- 
ential operator normalized by ay = 1, N > 1, and E = P(8)7! the fundamental 
solution as determined above. 


1. E is a function of class CN~*(R) with support in R*. 

2. Given an inhomogeneous term f € D',(R), a solution of P(D)y = f is y = 
Ex f. 

3. If the inhomogeneous term f is a continuous function on R with support in R*, 
then the special solution y = E « f of P(D)y = f is a classical solution, i.e., a 
function of class CN (R), which satisfies the differential equation. 


Proof According to Proposition 8.1, the fundamental solution E has the represen- 
tation E = e,, *---*e,, where the N roots {z1,...,Zy} are not necessarily distinct. 
Thus we can write DN~?E = De., «+++ * Deézy_, * €zy_; * zy - Previous calculations 
have shown that De, = 6 + ze;. It follows that 


N-2 
D E = (6+ ze:,) *-++* (6 + ZN_2€zy_y) * Czy_1 * Czy 
N-2 
= Czy * Czy) 1 : i Zjecy * Ccy_ Coy Ht Za ZN egy KK eye 


Next we determine the continuity properties of the convolution product e, * e,, of the 
function e, and e,, for arbitrary z, w € C. According to the definition of the functions 
e, and the convolution, we find 


ees oe / ecly)ey(x — y)dy 


8.1 Symbolic Calculus—Ordinary Linear Differential Equations 105 


x : 
= ocx) | ee" dy = ace f eddy. 
. 0 


According to this representation, e, > é,, is a continuous function on R with support in 
R°. It follows that also all convolution products with m > 2 factors are continuous 
functions on R with support in R+. Hence the formula for D’~? E shows that DN~* E 
is continuous and thus E£ has continuous derivatives up to order N — 2 on R and has 
its support in R*. This proves the first part. 

The second part has been shown above. In order to prove the third part we evaluate 
DN(E « f) = (DE) * f. As above we find 


DNE = De,, *-++* Dezy = (6 + 21€z,) # +++ * (5 + Zwezy) 


N 
= r) + Dies L pez; + ae LiL jez; * ez; + roe + Z * 7+ fnezy, ‘i * Ezy» 


and therefore 


N 

DNy = DE * fy = ft do jee ft Fer veg oH Cay # SF. 
j=l 

This shows that the derivative of order N of y = E x f, calculated in the sense 

of distributions, is actually a continuous function. We conclude that this solution 

is an N-times continuously differentiable function on R with support in RT, ie., a 

classical solution. 


Remark 8.1 Theorem 8.1 reduces the problem of finding a solution of the ordinary 
differential equation P(D)y = f to the algebraic problem of finding all the roots 
{A1, nw, v} of the polynomial P(x) and their multiplicities {ki, bts kp}. 

A simple concrete example will illustrate how convenient the application of The- 
orem 8.1 is in solving ordinary differential equations. Consider an electrical circuit 
in which a capacitor C, an inductance L, and a resistance R are put in series and 
connected to a power source of voltage V(t). The current /(t) in this circuit satisfies, 
according to Kirchhoff’s law, the equation 


di(t) 1 f° 
V(t) = RI(t) + L—— + = | [(s)ds. 

dt C Jo 
Differentiation of this identity yields, using D = 4, 
1 


R 
= =_ 2 
DV()=LP(DIG) PWD) = D+ -D+ Ta. 


The roots of the polynomial P(x) = xe 4 ax + va are Ajo = # Ex (4 y 7 
and therefore, according to Theorem 8.1, 


1 1 1 
ItH= ahr’ * DV(t)= LT *é@,, * DV(t) = 7 (Pen) *é@), * V(t). 


Since we know De, = 4 + ze, a special solution of the above differential equation 
is 

1 Ay 
It)= rage * V)(t) + 7 (en *e,, * V(t). 


106 8 Applications of Convolution 


8.2 Integral Equation of Volterra 


Given two continuous functions g, K on R*, we look for all functions f satisfying 
Volterra’s linear integral equation 


or i; Ke —y)fO)dy = g(x) Vx ER. (8.7) 


Integral equations of this type are for instance used in optics for the description of 
the distribution of brightness. 

How can one solve such equations? We present here a simple method based on 
our knowledge of the convolution algebra (D', (R), *). By identifying the functions 
f.g,K with the regular distributions 6f,6¢,0K in D‘,(R), it is easy to rewrite 
Eq. (8.7) as a convolution equation in Dy (R): 


(= K)* f =z. (8.8) 


In order to solve this equation, we show that the element 6 — K is invertible in DI, (CR). 
This is done in the following proposition. 


Proposition 8.2 /f K : Rt > R is a continuous function, then the element 5 — K 
has an inverse in D', (IR). This inverse is of the form 


(6—K)!=s+4+H, 


where H is a continuous function Rt — R. Volterra’s integral equation has thus 
exactly one solution that is of the form 


fasts. 
Proof We start with the well-known (in any ring with unit) identity 
6— K**) — (6— K)*(6+K4+K?4+.---4+ K*), (8.9) 


and show that the series )~°° , K*! converges uniformly on every compact subset of 
R°. For this it suffices to show uniform convergence on every compact interval of 
the form [0,r] forr > 0. Since K is continuous we know that M, = supg-,<, |K(x)| 
is finite for every r > 0. Observe that _ 


K(x) = - K(y)K(x— y)dy => |K?(0)| S$ Myx, 
0 


and therefore by induction (see exercises), 


; ; x7! 
|K*"(x)| < M,ayi Vx € [0, r]. 


8.3. Linear Partial Differential Equations with Constant Coefficients 107 


The estimate 
oo 1 


o.e) 
sei meas = Mx 
> IK"@) SD May = re 


i=l 


implies that the series )°*° , K* (x) converges absolutely and uniformly on [0, r], for 
every r > 0. Hence this series defines a continuous function H : Rt > R, 


oe) 
H= > aie 
i=l 


With this information we can pass to the limit nm — oo in Eq. (8.9) and find 6 = 
(6 — K) * (5+ H), hence (6 — K)~'! = 6 + H, which proves the proposition since 
the convolution algebra Dy (R), *) is without divisors of zeros and 6 — K 4 0. 


8.3. Linear Partial Differential Equations 
with Constant Coefficients 


This section reports one of the main achievements of the theory of distributions, 
namely providing a powerful framework for solving linear partial differential equa- 
tions (PDEs) with constant coefficients. Using the multi-index notation, a linear 
partial differential operator with constant coefficients will generically be written as 


PID)= >) DY, ae eC, MSL 2x5 (8.10) 


ja|<m 


Suppose that {2 C R” is a nonempty open set. Certainly, operators of the form 
(8.10) induce linear maps of the test function space over (2 into itself, and this map 
is continuous (see exercises). Thus, by duality, as indicated earlier, the operators 
P(D) can be considered as linear and continuous operators D’(2) — D’/(Q). Then, 
given U € D’(Q2), the distributional form of a linear PDE with constant coefficients 
is 


P(D)T =U in D(2). (8.11) 


Note that T € D’({2) is a distributional or weak solution of (8.11) if, and only if, for 
all ¢ € D(Q), one has 


(U,) = (P(D)T,$) = (T, P'(D)¢), 


where P'(D) = re (— 1)!@!a, D®. In many applications however one is not so 
much interested in distributional solutions but in functions satisfying this PDE. If 
the righthand side U is a continuous function, then a classical or strong solution of 
Eq. (8.11) is a function T on §2 which has continuous derivatives up to order m and 
which satisfies (8.11) in the sense of functions. As one would expect, it is easier to 


108 8 Applications of Convolution 


find solutions to Eq. (8.11) in the much larger space D’(2) of distributions than in 
the subspace C“")(2) of m times continuously differentiable functions. Nevertheless, 
the problems typically require classical and not distributional solutions and thus the 
question arises: when, i.e., for which differential operators, a distributional solution 
is actually a classical solution? This is known to be the case for the so-called elliptic 
operators. In this elliptic regularity theory one shows that, for these elliptic operators, 
weak solutions are indeed classical solutions. This also applies to nonlinear PDEs. 
In Part II, Chap. 32, we present without proof some classes of typical examples. We 
mention here the earliest and quite typical result of the elliptic regularity theory, due 
to H. Weyl] (1940), for the Laplace operator. 


Lemma 8.2 (Lemma of Weyl) Suppose that T € Di eg( 2) is a solution of AT = 0 
in D'(Q), i.e. [ T(x)Ad(x)dx = 0 for all ¢ € D(&2). Then it follows that T € 
C?)(Q) and AT(x) = 0 hold in the sense of functions. 

We remark that in the special case of the Laplace operator A, one can actually 
show T € C™(S2). We conclude: In order to determine classical solutions of the 
equation AT = 0,T € CQ), it is sufficient to determine weak solutions in the 
much larger space D,,,,(S2). 

Naturally, not all differential operators have this very convenient regularity prop- 

2 


2°, 
a cs in two 


erty. As a simple example we discuss the wave operator Oy = s4 — 7a 
dimensions, which has many weak solutions that are not strong solutions. Denote by 
f the characteristic function of the unit interval [0, 1] and define u(t, x) = f(x —1f). 
Then u € D.(R°) and Ou = 0 in the sense of distributions. But u is not a strong 
solution. 

In the context of ordinary linear differential operators we have learned already 
about the basic role that a fundamental solution plays in the process of finding 
solutions. This will be the same for linear partial differential operators with constant 
coefficients. Accordingly, we repeat the formal definition. 


Definition 8.1 Given a differential operator of the form (8.10), every distribution 
E € D’(R’) that satisfies the distributional equation 


P(D)E =6 


is called a fundamental solution of this differential operator. 

In the case of ordinary differential operators we saw that every constant coefficient 
operator has a fundamental solution and we learned how to construct them. For 
partial differential operators the corresponding problem is much more difficult. We 
indicate briefly the main reason. While for a polynomial in one variable the set of 
zeros (roots) is a finite set of isolated points, the set of zeros of a polynomial inn > 1 
variables consists in general of several lower dimensional manifolds in R”. 

It is worthwhile mentioning that some variation of the concept of a fundamental 
solution is used in physics under the name Green’s function. A Green’s function is 
a fundamental solution that satisfies certain boundary conditions. In the following 
section and in the sections on tempered distributions we are going to determine 
fundamental solutions of differential operators that are important in physics. 


8.3. Linear Partial Differential Equations with Constant Coefficients 109 


Despite these complications, B. Malgrange (1953) and L. Ehrenpreis (1954) 
proved independently of each other that every constant coefficient partial differential 
operator has a fundamental solution. 


Theorem 8.2 Every partial differential operator P(D) = are Aq D" dy € C, 
has at least one fundamental solution. 

The proof of this basic result is beyond the scope of this introduction and we have 
to refer to the specialized literature, for instance [1]. Knowing the existence of a 
fundamental solution for a PDE-operator (8.10), the problem of existence of solutions 
of PDEs of the form (8.11) has an obvious solution. 


Theorem 8.3 Every linear PDE in D’(R") with constant coefficients 


> dyD*T =U 


la|<m 


has a solution in D’(R") for all those U € D’(R") for which there is a fundamental 
solution E € D'(R") such that the pair (E, U) satisfies the support condition. In this 
case a special solution is 


T= Ex. (8.12) 


Such a solution exists in particular for all distributions U € E'(R") of compact 
support. 


Proof If we have a fundamental solution E such that the pair (E, U) satisfies the 
support condition, then we know that the convolution E * U is well defined. The 
rules of calculation for convolution products now yield 


P(D\E * U) = (P(D)JE)* U =6*U =U, 


hence T = E x U solves the equation in the sense of distributions. If a distribution 
U has a compact support, then the support condition for the pair (E, U) is satisfied 
for every fundamental solution and thus we conclude. 

Obviously, a differential operator of the form (8.10) leaves the support of a dis- 
tribution invariant: supp (P(D)T) © supp T for all T € D’(R"), but not necessarily 
the singular support as defined in Definition 3.5. Those constant coefficient partial 
differential operators that do not change the singular support of any distribution play 
a very important role in the solution theory for linear partial differential operators. 
They are called hypoelliptic for reasons that become apparent later. 


Definition 8.2 A linear partial differential operator with constant coefficients P(D) 
is called hypoelliptic if, and only if, 


sing supp P(D)T = sing supp T VT € DR’). (8.13) 


Since one always has sing supp P(D)T C sing supp T, this definition is equivalent to 
the following statement: If P(D)T is of class C° on some open subset $2 C R", then 
T itself is of class C® on §2. With this in mind, we present a detailed characterization 


110 8 Applications of Convolution 


of hypoelliptic partial differential operators in terms of regularity properties of its 
fundamental solutions. 


Theorem 8.4 Let P(D) bealinear constant coefficient partial differential operator. 
The following statements are equivalent: 


(a) P(D) is hypoelliptic. 
(b) P(D) has a fundamental solution E € C°(R" — {0}). 
(c) Every fundamental solution E of P(D) belongs to C~(R" — {0}). 


Proof We start with the observation that Dirac’s delta distribution is of class C° 
on R”\ {0}. If we now apply condition (8.13) to a fundamental solution E of the 
operator P(D), we get 


sing supp E = sing supp (P(D)E) = sing supp 6 = {0}, 


hence (a) implies (c). The implication (c) = (b) is trivial. Thus we are left with 
showing (b) => (a). 

Suppose E € C*(R"\ {0}) is a fundamental solution of the operator P(D). As- 
sume furthermore that 2 C R” is a nonempty open subset and T € D’(R") a 
distribution such that P(D)T € C™({2) holds. Now it suffices to show that T itself 
is of class C* in a neighborhood of each point x in 2. Given any x € 92, there is 
an r > 0 such that the open ball B2,(x) is contained in §2. There is a test function 
@ € D(R") such that supp ¢@ C B,(0) and @(x) = | for all x in some neighborhood 
V of zero. 

Using Leibniz’ rule we calculate 


! 
P(D\(@E) = es dey aS preren BE 


Ba Db 
= PP(D)E+ Doe osc a =e —_—__ Dp p* PE 


The properties of ¢ imply that the function y vanishes on the neighborhood V and 
has its support in B,(0); by assumption (b), the function y is of class C° on R”\ {0}, 
hence w € D(R"), and we can regularize the distribution T by y and find 


T+w*xT=(6+ W)*T =[P(D)GE)]*T = GE) *(P(D)T), 


or T = Ex P(D)T -—wxT. 


8.4 Elementary Solutions of Partial Differential Operators 


Theorems 8.3 and 8.4 of the previous section are the core of the solution theory 
for linear partial equations with constant coefficients and through them we learn 
that, and why, it is important to know elementary solutions of constant coefficient 


8.4 Elementary Solutions of Partial Differential Operators 111 


partial differential operators explicitly. Accordingly, we determine in this section the 
elementary solutions of differential operators, which are important in physics. In 
some cases we include a discussion of relevant physical aspects. Later in the section 
on Fourier transforms and tempered distributions we learn about another method to 
obtain elementary solutions. 


8.4.1 The Laplace Operator A, = >~; * in IR” 


The Laplace operator occurs in a number of differential equations that play an im- 
portant role in physics. After we have determined the elementary solution for this 
operator we discuss some of the applications in physics. 


Proposition 8.3 The function E,, : R"\ {0} > R, defined by 


+ log |x| forn = 2, 


E,(x) = (8.14) 


=] 2—n 
Gods | for n > 3, 
where |S,| = 27 att r=) is the area of the unit sphere S,, in R", has the following 
properties: 


(a) E, € Lj,,(R") NC*(R"\ {0}); 
(b) A, E,(x) = 0 for all x € R"\ {0}; 
(c) E, is the elementary solution of the Laplace operator A, in R", which is thus 


hypoelliptic. 


Proof Using polar coordinates it is an elementary calculation to show that E,, is 
locally integrable in R”. Similarly, standard differentiation rules imply that E,, is of 
class C® on R”\ {0}. This proves part (a). The elementary proof of part (b) is left as 
an exercise. Uniqueness of the elementary solution for the Laplace operator follows 
from Hormander’s theorem (see Theorem 10.8). Thus we are left with proving that 
the function E,, is an elementary solution. 

For any test function ¢ € D(R"), we calculate 


(An En.) = / En(x)Ang(x)dx = lim En(x)An(x)(x)dx, 


10 Sir<|x|<R] 


since E,, is locally integrable and where R is chosen such that supp¢@ C Br(0). 
Here [r < |x| < R] denotes the set {x € R” : r < |x| < R}. Observe that ¢ vanishes 
in a neighborhood of the boundary of the ball Br(O) and that A,E,(x) = 0 in 
[r < |x| < R]. Therefore, applying partial integration and Gauss’ theorem twice, 
we get for the integral under the limit, 


= P(X)Vn En(x) - dS(x) +f En(x)Vn@(x) - dS(x). 


|xl=r |x|=r 


112 8 Applications of Convolution 


In the exercises one shows that the limit r — 0 of the first integral gives @(0) while 
the limit of the second integral is zero. It follows (A,,@) = $(0) = (6,@) for all 
@ € DCR") and thus A, E, = 6. 

The case n = | is elementary. We claim E\(x) = x(x) is the elementary solution 
of A; = . The proof is a straightforward differentiation in the sense of distributions 
and is left as an exercise. 

Now we discuss the case n = 3 that is of particular importance for physics. The 
fundamental solution for A3 is 


1 1 
BO a Vx € R*\ {0}. (8.15) 


This solution is well known in physics in connection with the Poisson equation 
A3U = p, (8.16) 


where p is a given density (of masses or electrical charges), and one is looking for 
the potential U generated by this density. In physics we learn that this potential is 
given by the formula 


1 
p(x) dy 


a V¥xeR (8.17) 
4a Jr3 |x — y| 


U(x) = 


whenever p is an integrable function. One easily recognizes that this solution formula 
is just the convolution formula for this special case: 


U(x) = E3 * p(x). 


Certainly, the formula U = E3 x ¢ gives the solution of Eq. (8.16) forall op € E “(R3), 
not just for integrable densities. 


8.4.2. The PDE Operator 2 — A, of the Heat Equation 
in Rat 


We proceed as in the case of the Laplace operator but refer for a discussion of the 
physical background of this operator to the physics literature. 


Proposition 8.4 The function E,, defined on the set (0, +00) x R" by the formula 


E, (t,x) = ( Jace (8.18) 


has the following properties: 
(a) E, € L} (R"*!) NC~%((0, +00) x R"); 


loc 


(b) (2 — A, )E, (t,x) = 0 for all (t,x) € (0, +00) x R’; 


8.4 Elementary Solutions of Partial Differential Operators 113 
(c) E,, is the elementary solution of the operator z — Ay, which is thus hypoelliptic. 


Proof Since the statements of this proposition are quite similar to the result on the 
elementary solution of the Laplace operator, it is natural that we can use nearly the 
same strategy of proof. Certainly, the function (8.18) is of class C™ on (0, +00) x R’. 
In order to show that this function is locally integrable on R’*", it suffices to show 


that the integral 
t 
I(t)= / i E,,(s, x)dx dt 
0 Y|x|<R) 


is finite, for every tf > 0 and every R > O. For every s > 0, the integral with 
respect to x can be estimated in absolute value by 1, after the change of variables 
x = 2,/sy. Thus it follows that /(t) < ¢ and therefore E,, € L/,.(R"*'). Elementary 
differentiation shows that part (b) holds. Again, uniqueness of the elementary solution 
follows from Hormander’s theorem (Theorem 10.8). 

Now take any ¢ € D(R"*!); since E,, is locally integrable it follows, using 
a, = 2, that 


ar? 
((0; _ An)En, ) = —(En, (0; + An)o) 
ities / i ie E,(t,x)( + Ant, x)dx dt = lim,s0f-(@). 
r R" 


Since ¢ has a compact support, repeated partial integration in connection with Gauss’ 
theorem yields 


[ene cdnontnar =f (An En)(t, x)b(t, x)dx 
R" R" 


for every t > 0. Therefore, by partial integration with respect to t, we find 
oe) 
1.(b) = / / (An En)(t,x)6(¢,x)dx dt + i Ex(t, xb, tx 
r R" R" 
[o.e) 
— ff eznxe.nte.x de at 
r R" 
o.e) 
= / / (C= 0; + An)En)t, x)P(Ct, x)dx dt — i E,(r,x)@(r, x)dx 
r R” R 


=— i E,(r,x)o(r, x)dx. 
R" 
Here we have used Fubini’s theorem for integrable functions to justify the exchange 


of the order of integration, and in the last identity we have used part (b). This allows 
the conclusion 


(0, — An)En, 6) = (==) lim _,0 [ rhe ocr, xd 


114 8 Applications of Convolution 


1 n ye 
= (=) lim , +0 d. eF g(r, Vry)dy 
= $(0,0), 


where we used the new integration variable y = ae Lebesgue’s theorem of 
dominated convergence, and the fact that 


i eT dy = (2/7). 


Since @ € D(R"t!) is arbitrary, this shows that (0, — A,)E, = 6 and hence the given 
function is indeed the elementary solution of the operator 0 — A,. 


8.4.3 The Wave Operator 4 = 05 — A3 in R* 


Here we use the notation 09 = ae In applications to physics, the variable xo has the 
interpretation of x9 = ct, c being the velocity of light and ¢ the time variable. The 
variable x € IR? stands for the space coordinate. For the wave operator, Hormander’s 
theorem does not apply and accordingly several elementary solutions for the wave 
operator are known. We mention two solutions: 


E,a(xo, x) = aa + x9)5(x2 — x). (8.19) 


These distributions are defined as follows: 


(0( + x0)8(xg — x7), O(X0,x)) = / o( + |x|,x)——. 
R3 x 


Since the function x b> wy is integrable over compact sets in IR*, these are indeed 


well-defined distributions. 


Proposition 8.5 The distributions (8.19) are two elementary solutions of the wave 
operator U4 in dimension 4. Their support properties are: 


supp E, = { (xo, x) ER*: x > 0, Xe =z = o} 


supp Ey, = { (x0, x) ER’: xo < 0, xe -Y= 0} : 


Proof The obvious invariance of the wave equation under rotations in R* can be 
used to reduce the number of dimensions that have to be considered. This can be 
done by averaging over the unit sphere S? in R*. Accordingly, to every @ € D(R’*), 
we assign a function @ : R x Rt by the formula 


8.4 Elementary Solutions of Partial Differential Operators 115 


d(t.s) = / o(t,sw)do, 
S2 


where dw denotes the normalized surface measure on S*. Introducing polar 
coordinates in IR?, we thus see 


(Ea, 9) = [ so a 5,5)ds. 
0 


In the exercises it is shown that 


a a 2a 
ar2 assist OS 


4o(t,s) = 


Thus we get 


o.e) 
(1B a0) = (Exar Oud) = frp, nar 
0 
Introducing, for t > 0, the auxiliary function 


__d¢ ag . 
ae (t,t) ae (t,t) — d(t,t), 


which has the derivative 


OF. 25 7 
wom} oe 28 len 


at2—s as?—stt.: Os 


it follows that 


— 


4E,,) = i: uw (t)dt = —u(0) = $(0,0) = $(0) = (8,4), 


and thus we conclude that E, is an elementary solution of the wave operator. The 
argument for E, is quite similar. 


Remark 8.2 


1. Though the wave operator Oy is not hypoelliptic, it can be shown that it is hy- 
poelliptic in the variable xo. This means that every weak solution u(xo, x) of the 
wave equation is a C~©-function in x9 (see [1]). 

2. Later with the help of Fourier transformation for tempered distributions, we will 
give another proof for E,, being elementary solutions of the wave operator. 

3. In particular, in applications to physics, the support properties will play an impor- 
tant role. According to these support properties, one calls E. a retarded and E, 
an advanced elementary solution. The reasoning behind these names is apparent 
from the following discussion of solutions of Maxwell’s equation. 


116 8 Applications of Convolution 
8.4.3.1 Maxwell’s Equation in Vacuum 


Introducing the abbreviations x9 = ct and 09 = oe Maxwell’s equation in vacuum 
can be written as follows (see [2]): 


curl EF + 09B =O Faraday’s law 
div B =0 source-free magnetic field 
curl B + d9E = j Maxwell’s form of Ampere’s law 


div E = p Coulomb’s law 
In courses on electrodynamics it is shown: Given a density p of electric charges and 
adensity j of electric currents, the electric field E and the magnetic field B are given 
by 7 
B=curlA, E=—V®-— 0A, 


where (@, A) are the electromagnetic potentials. In the Lorenz gauge, i.e., dg + 
div A = 0, these electromagnetic potentials are solutions of the inhomogeneous 
wave equations 


4 = p, 4A = j. 


(The last equation is understood component-wise, i.e., 04A; = j; fori = 1,2,3.) 

Thus the problem of solving Maxwell’s equations in vacuum has been put into a 
form to which our previous results apply since we know elementary solutions of the 
wave operator. 

In concrete physical situations, the densities of charges and currents are switched 
on at a certain moment that we choose to be our time reference point t = 0. Then 
one knows supp p, supp j © {(x0, x) ER*: x9 > O}. 

It follows that the pairs (E,, 0) and (E,, j) satisfy the support condition and thus 
the convolution products E, * p and E, * j are well defined. We conclude that the 
electromagnetic potentials are given by 


(@, A) = (E, *« p, E; * j), 


which in turn give the electromagnetic field as mentioned above. Because of the 
known support properties of E, and p and the formula for the support of a convolution, 
we know: supp ® C {(xo, xyeR*: x > 0} and similarly for A. Hence our solution 
formula shows causality, i.e., no electromagnetic field before the charge and current 
densities are switched on! The other elementary solution E, of the wave operator 
does not allow this conclusion. 

Note that the above formula gives a solution for Maxwell’s equation not only for 
proper densities (9 € L'(R3)) but also for the case where p is any distribution with 
the support property used earlier. The same applies to /. 

Under well-known decay properties for and j for |x| —> +00, one can show that 
the electromagnetic field (E, B) determined above is the only solution to Maxwell’s 
equation in vacuum. 


References 117 


8.5 Exercises 


1. Let f : R* > R be a continuous function. In D’,(R), find a special solution of 
the ordinary differential equation 


y — 8y® + loy = f 


and verify that it is actually a classical solution. 
2. Let K : R* — R be acontinuous function. For n = 2,3,..., define 


K(x) = i; K*°-\(y)K(& — y)dy, Vx €Rt 
0 


and show that for every 0 < r < ov, one has 


n—1 
|K""(x)| <M" =—_ Vx € [0,r]. 
(n — 1)! 
Here M, = supg<y<, |K(x)I- 
3. Let A,, be the Laplace operator in R” (n = 2,3,...). Fora € N”, solve the PDE 


A,u= 5, 


4. For the function E,, of Eq. (8.18), show (2 — A,)E, (t,x) = 0 for all (t,x) € 
(0, +00) x R”. 

5. Find the causal solution of Maxwell’s equations in vacuum. 
Hints: Use the retarded elementary solution of the wave operator and calculate 
E and B according to the formulae given in the text. 


References 


1. Hormander L. The analysis of linear partial differential operators 2. Differential operators of 
constant coefficients. Berlin: Springer-Verlag; 1983. 

2. Thirring W. A course in mathematical physics : classical dynamical systems and classical field 
theory. Springer study edition. New York: Springer-Verlag; 1992. 


Chapter 9 
Holomorphic Functions 


This chapter gives a brief introduction to the theory of holomorphic functions of one 
complex variable from a special point of view which defines holomorphic functions 
as elements of the kernel or null space of a certain hypoelliptic differential operator. 
Thus, this chapter offers a new perspective of some aspects on the theory of functions 
of one complex variable. A comprehensive modern presentation of this classical 
subject is [1]. 

Our starting point will be the observation that the differential operator in D’(R7) 


ia 
~ 2\ax oy 


is hypoelliptic and some basic results about convergence in the sense of distributions. 
Then holomorphic functions will be defined as elements in the null space in D’(R7) 
of this differential operator. Relative to the theory of distributions developed thus far, 
this approach to the theory of holomorphic functions is fairly easy, though certainly 
this is neither a standard nor too direct an approach. 


9.1 Hypoellipticity of 3 

We begin by establishing several basic facts about the differential operator 3 in 
D’'(R’). 

Lemma 9.1 The regular distribution on R*, (x,y) > =a 
solution of the differential operator 0 in D’(R?), i.e, in D’(R?) one has 


is an elementary 


| 
9——___ =3 
m(x + ty) 


Proof Itis easy to see that the function (x, y) is locally integrable on R? 


1 
and thus it defines a regular distribution. On R?\ {0} a straightforward differentiation 


© Springer International Publishing Switzerland 2015 119 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_9 


120 9 Holomorphic Functions 


shows aes = 0. Now take any ¢@ € D(R?) and calculate 


2 ‘if 7 Loo. ff 8b, y) 
ara + in’? ~ =F +iy)’ Or a u(x + ea ay, 


Since the integrand is absolutely integrable this integral can be represented as 


i; ap(x, y) 
=—lim es 
10 Jr<./@tytcr W(X + iy) 


where R is chosen large enough such that supp ¢@ C Br(O). For any 0 < r < R, we 
observe that 


| IG Y) a= f a] (x, y) J« ie 
r<a/x2+y2<R w(x + iy) r<a/x2+y2<R (x + iy) 


since Greer = 0 in R?\ {0}. Recall the formula of Green—Riemann for a domain 


2 C R? with smooth boundary I~ = 02 (see [2]): 
_ 1 
i dudx dy = —— § u(x, y)(dx + idy) (9.1) 
Q 2i rT 


which we apply to the function u(x, y) = 1 oe») to obtain 


= 1 


(x, y) 
OCG + iy)’ 


1 
o)=—-> tim f 
2im r>0 J /2yy2ar X + iy 


(dx + idy). 


Introducing polar coordinates x = r cos6, y = r sin @, this limit becomes 


2a 


1 
lim — d(r cos 6,rsin@)dé = (0,0) = (5, ¢) 
r—>0 Qn 0 


and thus we conclude. 


Corollary 9.1 The differential operator a in D’(R?) is hypoelliptic, i.e., every 
distribution T € D' (IR?) for which 8T is of class C® on some open set 2 C R? is 
itself of class C™° on 82. 


Proof Lemma 9.1 gives an elementary solution of 8 which is of class C®° on R?\ {0}. 
Thus by Theorem 8.4 we conclude. 

If we apply this corollary to a distribution T on R? which satisfies T = 0, it 
follows immediately that T is equal to a C°-function g, T = I,, for some g € 
C®(IR?), since obviously the zero function is of class C° everywhere. Therefore the 
null space of the operator 9 on D’(IR*) can be described as 


ker 0 = {T e€ D(R*): aT = 0} = {g €C™°(R’): dg = 0} 


9.1 Hypoellipticity of 0 121 


where as usual we identify the function g and the regular distribution J,. 
Now let 2 C R* be a nonempty open set. Similarly one deduces 


ker (0/D'(Q)) = {T € D(Q): AT = 0} = {g €C™(R): dg =0}. 


This says in particular that a complex-valued function g in L loc(S) Which satisfies 
dg = 0 in the sense of distributions is actually a C®-function on (2. 
As usual we identify the point (x, y) € IR? with the complex number z = x + iy. 


Under this identification we introduce 
H(Q) = {g € L'jp(@) : Og =0 in D'(2)} = {ue C*(Q2): Iu=O0}. (9.2) 


Elements in H(£2) are called holomorphic functions on 2. The following theorem 
lists the basic properties of the space of holomorphic functions. 


Theorem 9.1 Let 82 C C be anonempty open set. The space H(&2) of holomorphic 
functions on Q has the following properties: 


1. H(&2) is a complex algebra. 

2. H(S2) is complete for the topology of uniform convergence on all compact subsets 
of Q. 

3. Ifu € H(S82) does not vanish on 82, then 1 € H(S2). 

4. Ifa function u is holomorphic on Q and a function v is holomorphic on an open 
set 2 which contains u(Q), then the composition v 0 u is holomorphic on Q. 


Proof The nullspace of a linear operator on a vector space H(S2) is certainly a 
complex vector space. The product rule of differentiation easily implies that with 
u,v € H(S2) also the (pointwise) product u - v belongs to H({2). The verification 
that with this product H(S2) is indeed an algebra is straightforward and is left as an 
exercise. 

Suppose that (u,,) is a Cauchy sequence in H({2) for the topology of uniform 
convergence on all compact sets K C &2. It follows that there is some continuous 
function u on 92 such that the sequence u,, converges uniformly to u, on every compact 
set K C 92. Take any ¢ € D(S2). It follows, as n — om, that 


i nO LineGendcedy / ie SOG a dedy: 


thus u, — u in D’(&2). As a linear differential operator with constant coefficients 
the operator 0d: D2) > D'(&) is continuous and therefore Ou = limy—oo OU, = 
limp 00 0 = 0. We conclude u € H(S2). This proves the second part. 

If u € H(&) has no zeroes in 92, then 1 is a well-defined continuous function on 
2 and the differentiation rules imply ot => Ou = 0, hence 1 € H(82). 

The final part follows by a straightforward application of the chain rule and is in 
the Exercises. 

It is easy to give many examples of holomorphic functions. Naturally, every con- 
stant function u = a € C satisfies da = O and thus all constants belong to H(2). 


122 9 Holomorphic Functions 


Next consider the function z +> z. It follows that z = 5(1 — 1) = 0, hence this 
function belongs to H({2) too. Since we learned in Theorem 9.1 that H({2) is a 
complex algebra, it follows immediately that all polynomials P(z) = )7)"9 anz", 
an € C, belong to H(S2). 

According to Theorem 9.1 the algebra H(S2) is complete for the topology of 
uniform convergence on all compact sets K C (2. Therefore, all functions u: 2 —> 
C belong to H() which are the limit of a sequence of polynomials for this topology. 
We investigate this case in more detail. 

Recall some properties of power series (see for instance [1]). A power series 
ya a,(z — c)" with center c € C and coefficients a, € C has a unique disk of 
convergence Br(c) = {z € C: |z—c| < R} where the radius of convergence R is 
determined by the coefficients {a, : n = 0,1,2,..}. On every compact subset K of 
the disk of convergence the power series converges uniformly, and thus defines a 
complex-valued function uv on Br(c). From our earlier considerations it follows that 
u is holomorphic on this disk. 

Let 2 C C bea nonempty open set. A function uw: £2 > Cis said to be analytic 
on 92, if, and only if, for every point c € 92 there is some disk B,(c) C §2 such that on 
this disk the function wis given by some power series, i.e., u(z) = ae An(z—c)" for 
all z € B,(c). Since every compact subset K C 2 can be covered by a finite number 
of such disks of convergence, it follows that every analytic function is holomorphic, 
Le., 

A(22) € H(82) 


where A({2) denotes the set of all analytic functions on §2. In the following section, 
we will learn that actually every holomorphic function is analytic so that these two 
sets of functions are the same. 


9.2 Cauchy Theory 


According to our definition a holomorphic function uv on an open set is a function 
which solves the differential equation du = 0. If this is combined with a well-known 
result from classical analysis, the Green—Riemann formula, the basic result of the 
Cauchy theory follows easily. 


Theorem 9.2 (Theorem of Cauchy) Let 82 C C be a nonempty open set and B be 
an open set such that the closure B of B is contained in 82. Assume that the boundary 
0B of B is piecewise smooth (i.e., piecewise of class C!). Then, for allu € H(Q), 


§ u(z)dz = 0. (9.3) 
aB 


Proof The proof of Cauchy’s theorem is a simple application of the Green—Riemann 
formula (9.1). 


9.2 Cauchy Theory 123 


Theorem 9.3 (Cauchy’s Integral Formula I) Let 22 Cc C be a nonempty open set 
and K C 82 acompact subset whose boundary (with standard orientation) " = 0K 
is piecewise smooth. Then, for every u € H(S2), one has 


1 u(z) d 0 if zo Z K, 
ae c= 
2in Jr Z— 2% u(z) if z€ K\I. 


(9.4) 


Proof Denote by K=K \I° the interior of the compact set K and by x the charac- 
teristic function of K. Now, given any u € H(S2), introduce the regular distribution 
T = xu. Using du = 0 and again the Green—Riemann formula (9.1) we find, for 
any @ € D(S2), 


(8T,¢) =—(T,0¢) = — fe ne dy 
= — fz (up)(z)dx dy = — 3 f,, u(z)(z)dz. 


oe 9.1 says that at | = §. Since T hasa oe support in K, the convolution 


with — exists and the seeaily T=18 OT * = ! holds. Take a test function @ which 
satisfies o(z) = 1 for all z € K and “which has its support in a sufficiently small 
neighborhood U of K. For zy € §2\I° the combination of these identities yields 


PZ) 


1 
T (Zo) = ~ (a7 * -) (Zo) = ~ (ON), ~ am 


--=-¢ ud) (Zz) 7 1 f- a ae, 
2in Jr Z— 2 2in Jr Z— Zo 


and thus Cauchy’s integral formula follows. 
Cauchy’s integral formula (9.4) has many applications, practical and theoretical. 

We discuss now one of the most important applications which shows that every 

holomorphic function has locally a power series expansion and thus is analytic. 


Theorem 9.4 (Cauchy’s Integral Formulae II) Let 2 Cc C be a nonempty open 
set. For every c € $2 define 


R= R(c) = sup {R’ > 0: Brie) Cc 2| ; (9.5) 


Then, for every u € H(Q2) and every r € (0, R) the following statements hold: 


1. u has a power series expansion in B,(c), 
o.e) 
u(z) = Yo ay(z—c)" Vz € B,(c); 


and this power series expansion converges uniformly on every compact subset 
K C B,(c). 


124 9 Holomorphic Functions 


2. The coefficients a, of this power series expansion are given by Cauchy’s integral 
formulae 


e = (0) === Oe #SGGe. 08) 
n— “n ~~ Dit ae (z—c)t*! litt et ioe RR : 


and these coefficients depend on c and naturally on the function u but not on the 
radius r € (0, R) which is used to calculate them. 


Proof Take any c € 2 and determine R = R(c) as in the theorem. Then for every 
r € (0,R) we know B,(c) C 82. Thus, Theorem 9.3 applies to K = B,(c) and 
I =0B,(c) = {z€C: |z—c| =r} and hence for all z € B,(c) one has 


1 WE) 
u(z) = aie _ E ~ ra 


Take any compact set K C B,(c). Since |z — c| < r = |€ — c| we can expand the 
function € b> — into a geometric series: 


This series converges uniformly in € € 0B,(c) and z € K. Hence, we can exchange 
the order of integration and summation in the above formula to get 


fl u(é dé Ch 
u(z) = ee E i E-o"l — ea (z—c)". 


n=0 


Hence, the function uv has a power series expansion in B,(c) with coefficients a, 
given by formula (9.6). The proof that the coefficients do not depend on r € (0, R) 
is left as an exercise. 


Corollary 9.2. Let 2 C C be a nonempty open set. 


1. A function u on 82 is holomorphic if, and only if, it is analytic: 
H(Q) = A(). 


2. The power series expansion of a holomorphic function u on &2 is unique ina given 
disk B,(c) C 82 of convergence. 
3. Given c € 92 determine R = R(c) according to (9.5) and choose r € (0, R). 
Then, for every u € H(&2), the following holds: 
a) Forn=0,1,2,... the coefficient a, of the power series expansion of u at the 
point c and the nth complex derivative of u at c are related by 


u™(c) = nap. (9.7) 


9.3. Some Properties of Holomorphic Functions 125 


b) The nth derivative of u at c is bounded in terms of the values of u on the 
boundary of the disk B,(c) according to the following formula (Cauchy 
estimates): 

|dn|r" < sup |u(z)]. (9.8) 


|z—cl=r 


Proof Inthe discussion following Theorem 9.1, we saw that every analytic function 
is holomorphic. The previous theorem shows that conversely every holomorphic 
function is analytic. The uniqueness of the power series expansion of a holomorphic 
function at a point c € 2 was shown in Theorem 9.4. 

The Cauchy estimates are a straightforward consequence of the Cauchy formulae 
(9.6): 


1 d 11 
lan| < =¢ NICE sup |u(z)|2zr. 
TS 


z—c|=r IZ— cient ~ 27 part |z—cl=r 


This estimate implies (9.8) and thus we conclude. 


9.3. Some Properties of Holomorphic Functions 


As a consequence of the Cauchy theory, we derive some very important properties 
of holomorphic functions which themselves have many important applications. 


Corollary 9.3 (Theorem of Liouville) The only bounded functions in H(C) are the 
constants, i.e., if a function u is holomorphic on all of C and bounded there, then u 
is a constant function. 


Proof Suppose u € H(C) is bounded on C by M, i.e., sup,e¢ |u(z)| = M < +00. 
Since u is holomorphic on C the value of R = R(O) in (9.5) is +00. Hence, in the 
Cauchy estimates we can choose r as large as we wish. Therefore, in this case we 
have |ay,| < u for every r > 0. It follows that a, = 0 forn = 1,2,..., and thus 
Theorem 9.4 shows that u is constant. 


Corollary 9.4 (Fundamental Theorem of Algebra) Suppose P is a polynomial of 
degree N > | with coefficients ay € C, i.e., P(zZ) = pear Anz", Ay # 0. Then there 
are complex numbers {z,...,Zn}, the roots of P, which are unique up to ordering 
such that 

P(Z)=ai(z—Z21)°*-@-z)  =VzEC. 


Tf all the coefficients of the polynomial P are real, then P has either only real roots 
or if complex roots exist, they occur as pairs of complex numbers which are complex 
conjugate to each other and have the same multiplicity; in such a case the polynomial 
factorizes as 


P(2) =an(Z— HX) (Z—AMIZ—- UP ize- uP = Vz EC. 


Here x,...,Xm are the real roots of P and z,2Z1,...,Z,Z% are the complex roots of 
P; hencem+2k=N. 


126 9 Holomorphic Functions 


Proof Ina first and basic step, we show that a polynomial which is not constant has 
at least one root. Suppose P is a polysomis! of degree N > 1| which has no roots in 
C. Then we know that the function z + >— is holomorphic on C. 


PG ) 
We write the polynomial in the form 


ay- a 
PC) =ayz" [14 Ne ech 2 | 
anz anz 


and choose R so large that 


an-1 ao 
Hecaied 7 
anz Anz 


for all |z| > R. It follows that 
i N 
[Pe 3 law IR V[z| > R. 


On the compact set Kr = {z € C: |z| < R} the continuous function | P| is strictly 
positive (since we have assumed that P has no roots), i.e., 


b = br = inf |P(z)| > 0. 
ze€Kr 


It thus follows that 55 5 is bounded on C: 


2 


1 
|P(z)| roicon la lay|RY 


VzeEC. 


By Liouville’s theorem (Corollary 9.3) we conclude that =~ _ j and thus P(z) is constant 
which is a contradiction to our hypothesis that the degree N of P is larger than or 
equal to 1. We deduce that a polynomial of degree N > | has at least one root, i.e., 
for at least one z € C, one has P(z) = 0 

In order to complete the proof, a proof by induction with respect to the degree 
N has to be done. For details we refer to the Exercises where the special case of 
polynomials with real coefficients is also considered. 

Holomorphic functions differ from functions of class C® in a very important 
way: If all derivatives of two holomorphic functions agree in one point, then these 
functions agree everywhere, if the domain is “connected.” As we have seen earlier 
this is not all the case for C°°-functions which are not holomorphic. 


Theorem 9.5 (Identity Theorem) Suppose that 82 C C is a nonempty open and 
connected set and f,g : 82 — C are two holomorphic functions. The following 
statements are equivalent: 


(a) f=8 
(b) The set of all points in 82 at which f and g agree, i.e., the set 


{2€ 2: f(z) = g(2} 


9.3 Some Properties of Holomorphic Functions 127 


has an accumulation point c € Q 
(c) There is a point c € 2 in which all complex derivatives of f and g agree: 
FMC) = gC) for alin = 0,1,2,.... 


Proof The implication (a) = (b) is trivial. In order to show that (b) implies (c), 
introduce the holomorphic function h = f — g on £2. According to (b) the set 
M = {z€ 2 : h(z) = 0} of zeroes of h has an accumulation point c € 92. Suppose 
that h”(c) 4 0 for some m € N. We can assume that m is the smallest number with 
this property. Then in some open disk around c we can write h(z) = (z — c)"hm(z) 


with hn(z) = -~-, A (z —c)'-™ and h,,(c) & 0. Continuity of h,, implies that 
hn(z) € 0 for all points z in some neighborhood U of c, U C B. It follows that the 
only point in U in which h vanishes is the point c, hence this point is an isolated point 
of M. This contradiction implies h(c) = 0 for alln = 0,1,2... and statement (c) 
holds. 

For the proof of the implication (c) = (a), we introduce again the holomor- 
phic function h = f — g and consider, fork = 0,1,2,..., the sets Ny = 
{z € 2: h“(z) =O}. Since the function h“ is continuous, the set Nx is closed 
in 2. Hence, the intersection N = NPN, of these sets is closed too. But N is at 
the same time open: Take any z € N. Since h is holomorphic in @ its Taylor series 
at Z converges in some open nonempty disk B with center z. Since z € N, all Taylor 
coefficients of this series vanish and it follows that h“|B = 0 for all k € N. This 
implies B C N and we conclude that N is open. Since §2 is assumed to be connected 
and N is not empty (c € N because of c)) we conclude N = £2 and thus f = g. 

There are other versions and some extensions of the identity theorem for 
holomorphic functions, see [1]. 

Another important application of Cauchy’s integral formula (9.4) is the classifica- 
tion of isolated singularities of a function and the corresponding series representation. 
Here, one says that a complex function u has an isolated singularity at a point c € C, 
if, and only if, there is some R > O such that u is holomorphic in the set 


Kor(c) ={z EC: 0 < |z-c| < R}, 


which is a disk of radius R from which the center c is removed. If a function is 
holomorphic in such a set it allows a characteristic series representation which gives 
the classification of isolated singularities. This series representation is in terms of 
powers and inverse powers of z — c and is called the Laurent expansion of u. 


Theorem 9.6 (Laurent Expansion) For 0 < r < R < +00 consider the annulus 
K, r(c) = {z € C: r < |z—c| < R} withcenterc and radiir and R. Every function 
u which is holomorphic in K,..r(c) has the unique Laurent expansion 


+00 
uz) = Y) an(z—o)" Vz € K,,n(0), (9.9) 


n=—CO 


128 9 Holomorphic Functions 


which converges uniformly on every compact subset K C K,.r(c). The coefficients 
dy of this expansion are given by 


1 u(t) 
N= ae ———dt Vne Z, 9.10 
. Qin fi (t—c)t! : oe 
where p € (r, R) is arbitrary. These coefficients depend only on the function u and 
on the annulus but not on the radius p € (r, R). 


Proof Consider any compact set K C K,.r(c). There are radii r; such that for all 
zEK, 
r<r<(|z-cl<m<R. 


Apply Cauchy’s integral formula (9.4) to the annulus K,,r(c) and a given function 
u € H(K,.r(c)). This yields 


u(t) 1 u(t) 
u(z) = —— dt + — di Week 
|t—cl=r, b=Z 217 |t—cl=r2 t—Z 


Uniformly in z € K and |t — c| = 7, respectively, |t — c| = r2, one has 


It — cel rn IZ—cl _ lz-el 
= <a<1_ respectively = 
IZ—cl |z—e| 


<Bp<i1. 


It—cl on 


The convergence of the geometric series )---  q" for 0 < q < 1 ensures the uniform 
convergence of the series 


1 1 t—c\" 
<= > V|t—cl=n; VzeE K, 
8 27 CT L=c€ 
1 1 fo.e) = n 
= » (2 =) Vit—el =m; VzeE K. 
f— Zz f—c* f—c¢ 


Therefore, we may exchange the order of summation and integration in the above 
integral representation of u and obtain uniformly in z € K, 


wll u(t)dt : 
“0 - Var gor |e 


. 1 n —n— 
+>) E f oe u(t)(t — c) a] gag 


If we choose —n — 1 as new summation index in the second series, we arrive at 
the Laurent expansion (9.9) with coefficients given by (9.10). A straightforward 
application of (9.4) shows that the integrals 


§ u(t)dt VmeZ 
| 


t—c|=p (t ~ cy" 


are independent of the choice of p € (r, R) and thus we conclude. 


9.3 Some Properties of Holomorphic Functions 129 


The announced classification of isolated singularities of a function u is based 
on the Laurent expansion of u at the singularities and classifies these singularities 
according to the number of coefficients a, 4 0 for n < 0 in the Laurent expansion. 
In detail one proceeds in the following way. 

Suppose c € C is an isolated singularity of a function u. Then there is an R = 
R(u,c) > 0 such that u is holomorphic in the annulus Ko,p(c) and thus has a unique 
Laurent expansion there: 


+00 


u(z)= > anz—cy" Wz € Koa(o). 


n=—0o 
One distinguishes three cases: 


a) a, = 0 for all n < 0. Then c is called a removable singularity. Initially u is 
not defined at z = c, but the limit lim,_,, u(z) exists and is used to define the 
value of u at z = c. In this way u becomes defined and holomorphic in the disk 
{z eC: |z—c| < R}.A well-known example is u(z) = ae forallz€ C,z 40. 
Using the power series expansion for sin z we find easily the Laurent series for u 
at z = 0 and see that lim,_,9 u(z) exists. 

There is k € N, k > 0, such that a, = 0 for alln € Z,n < —k and a, 4 0. Then 
the point z = c is called a pole of order k of the function u. One has |u(z)| — ++oo 
as z > c. A simple example is the function u(z) = z~? for z € C,z #0. Ithasa 
pole of order 3 inz = 0. 

c) ad, € O for infinitely many n € Z,n < 0. In this case the point c is called an 


b 


wm 


essential singularity of u. As an example we mention the function u(z) = e: 
defined for all z € C — {0}. The well-known power series expansion of the 
exponential function shows easily that the Laurent series of u at z = 0 is given 
by oy is and thus u has an essential singularity at z = 0. 


Assume that a function uw has an isolated singularity at a point c. Then, in a certain 
annulus Ko,p(c) it has a unique Laurent expansion (9.9) where the coefficients a, have 
the explicit integral representation (9.10). For n = —1 this integral representation is 


1 
aj = af u(z)dz (9.11) 
21m J\--cl=p 


for a suitable radius p. This coefficient is called the residue of the function u at the 
isolated singularity c, usually denoted as 


a_, = Res(u, c). 


Ifc is a pole of order 1, the Laurent expansion shows that the residue can be calculated 
in a simple way as 


Res (u, c) = lim (z — c)u(z). (9.12) 


130 9 Holomorphic Functions 


In most cases it is fairly easy to determine this limit and thus the residue. This offers 
a convenient way to determine the value of the integral in (9.11) and is the starting 
point for a method which determines the values of similar path integrals. 


Theorem 9.7 (Theorem of Residues) Suppose 2 C C is a nonempty open set 
and D C £2 a discrete subset (this means that in every open disk K,(z) = 
{€ €C: |& —z| <r} there are only a finite number of points from D). Further- 
more assume that K is a compact subset of 2 such that the boundary [ = 0K of K 
with standard mathematical orientation is piecewise smooth and does not contain 
any point from D. Then, for every u € H(Q\D), the following holds: 


a) The number of isolated singularities of u in K is finite. 


b) Suppose {Zo,Z1,.--,Zn} are the isolated singularities of u in K, then one has 
N 
§ u(z)dz = 2ni >> Res(u, Zn). (9.13) 
is n=0 


Proof Given a point z € K, there is an open disk in §2 which contains at most one 
point from D since D is discrete. Since K is compact, a finite number of such disks 
cover K. This proves part (a). 

Suppose that zo, Z1,... ,Zw are the isolated singularities of u in K. One can find 
radii ro9,r)],...,7%y such that the closed disks KG j) are pairwise disjoint. Now 
choose the orientation of the boundaries 0K,,(z;) = —y; of these disks in such a 
way that I” U UT oY; is the oriented boundary of some compact set K’ C 92. By 
construction the function u is holomorphic in some open neighborhood of K’ and 
thus (9.4) applies to give 


N 
dt + dt = 0, 
§ a t mm u(t)dt 


j=0 


i.e., by (9.11) 


N N 
§ u(t)dt = > § u(t)dt = 277i y° Restu, zj) 
yf aKr; (zj) j=0 


j=0 


and we conclude. 


Remark 9.1 


1. Only in the case of a pole of order 1 can we calculate the residue by the simple 
formula (9.12). In general, one has to use the Laurent series. A discussion of some 
other special cases in which it is relatively easy to find the residue without going 
to the Laurent expansion is explained in most textbooks on complex analysis. 

2. In the case of u being the quotient of two polynomials P and Q, u(z) = £ B one 


has a pole of order | at a point z = cif Q(c) = 0, Q'(c) £ 0, and P(c) 4 0. Then 


References 131 


the residue of u at the point c can be calculated by formula (9.12). The result is a 
convenient formula 


Res(u,c) = lim (¢ — e)u(z) = lim = 7 a 


z-¢ 


9.4 Exercises 


1. Write a complex-valued function f : S2 — C on some open set 2 C Cin 
terms of its real and imaginary parts, f(x + iy) = u(x, y) + iv(x, y) for all 
z= x-+iy € @ where wand v are real-valued functions. Show: If 3 f(z) = 0 on 
92, then the functions u, v satisfy the Cauchy—-Riemann equations 


ou 
apy =+7(,y), 


— ov 


Ou 
ay) S ay Oe Y)- 


2. Prove Part 4 of Theorem 9.1. 

3. Show: In Cauchy’s integral formula (9.6) the right-hand side is independent of r, 
O<r<R. 

4. Complete the proof of Corollary 9.4. 
Hint: For the case of a real polynomial prove first that P(z) = O implies P(z) = 0 
and observe that a complex root z and its complex conjugate Z have the same 
multiplicity. 


References 


1. Hormander L. An introduction to complex analysis in several variables. Princeton: Van Nostrand; 
1967. 

2. Remmert R. Theory of complex functions. Graduate texts in mathematics, vol 122. 4th ed. 
Berlin: Springer; 1998. 


Chapter 10 
Fourier Transformation 


Our goal in this chapter is to define the Fourier transformation in a setting which is as 
general as possible and to discuss the most important properties of this transforma- 
tion. This is followed by some typical and important applications, mainly in the theory 
of partial differential operators with constant coefficients as they occur in physics. 

If one wants to introduce the Fourier transformation on the space D’(R") of all 
distributions on R”, one encounters a natural difficulty which has its origin in the 
fact that general distributions are not restricted in their growth when one approaches 
the boundary of their domain of definition. It turns out that the growth restrictions 
which control tempered distributions are sufficient to allow a convenient and powerful 
Fourier transformation on the space S’(R”) of all tempered distributions. As a matter 
of fact, the space of tempered distributions was introduced for this purpose. 

The starting point of the theory of Fourier transformation is very similar to that of 
the theory of Fourier series. Under well-known conditions a periodic complex valued 
function can be represented as the sum of exponential functions of the form a,e'”“*, 
n € Z, a, € C, where « is determined by the period of the function in question. The 
theory of Fourier transformation aims at a similar representation without assuming 
periodicity, but allowing that the summation index 1 might have to vary continuously 
so that the sum is replaced by an integral. 

On a formal level the transition between the two representations is achieved in the 
following way. Suppose that f : R — C is an integrable continuous function. For 
each T > 0 introduce the auxiliary function fr with period 2T which is equal to f 
on the interval [—7, 7]. Then f7 has a representation in terms of a Fourier series 


1 i ; +7 a 
Sr(x) = oT Seer with Ch = f(xje"T* dx. 
neZ = 
Now introduce v = n= and a, = c, and rewrite the above representation as 
1 ; +T 
fra)=—Y> ae” with a= f(xje* dx. 
© Springer International Publishing Switzerland 2015 133 


P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_10 


134 10 Fourier Transformation 


Two successive values of the summation index differ by 7; thus, formally, we get in 
the limit T — +00, 


I | | 
f(x) =n. / a,e*dv with a= / f(xje "dx. 
20 R R 


The following section will give a precise meaning to these relations. 

In order to be able to define and to study the Fourier transformation for distribu- 
tions, we begin by establishing the basic properties of the Fourier transformation 
on various spaces of functions. In the first section we introduce and study the 
Fourier transformation on the space L'(R”) of Lebesgue integrable functions. Re- 
call that L'(R”) denotes the space (of equivalence classes) of measurable functions 
f :R" => C which are absolutely integrable, i.e., for which the norm 


fli = i, | f (x)|dx 


is finite. The main result of Sect. 10.2 is that the Fourier transformation is an iso- 
morphism of the topological vector space S(IR”) which is the test function space for 
tempered distributions. This easily implies that the Fourier transform can be defined 
on the space of tempered distributions by duality. Section 10.3 then establishes the 
most important properties of the Fourier transformation for tempered distributions. In 
the final section on applications, we come back to the study of linear partial differen- 
tial operators with constant coefficients and the improvements of the solution theory 
one has in the context of tempered distributions. There we will learn, among other 
things, that with the help of the Fourier transformation it is often fairly easy to find 
elementary solutions of linear partial differential operators with constant coefficients. 


10.1 Fourier Transformation for Integrable Functions 


For x = (X1,...,Xn) € R" and p = (pj,..., Pn) € R", denote by p- x = pix; + 
+++ -++ pyX, the Euclidean inner product. Since for all x, p € R” one has jel] — 
all the functions x +> e!?* f(x), p € R", f € L'(R), are integrable and thus we get 
a well-defined function f : R” — C by defining, for all p € R”, 


f(p) = (Qn)? [ eh fade. (10.1) 


This function 7 is called the Fourier transform of f and the map defined on LR") 
by Ff = F(f) = f is called the Fourier transformation (on L'(R")). 


n 


Remark 10.1 The choice of the normalization factor (277)~2 and the choice of the 
sign of the argument of the exponential function in the definition of the Fourier 
transform are not uniform in the literature (see, for instance, [1—7]). Each choice has 
some advantage and some disadvantage. 


10.1 Fourier Transformation for Integrable Functions 135 


In our normalization the Fourier transform of Dirac’s delta distribution on R” will 
be Fé = (27)7?. 

The starting point of our investigation of the properties of the Fourier transform 
is the following basic result. 


Lemma 10.1 (Riemann-Lebesgue) The Fourier transform f = Ff of f € 
L|(R") has the following properties: 


a) ‘a is a continuous and bounded function on R". 
b) F: L'(R") > L®(R") is acontinuous linear map. One has the following bound: 


IF f lloo = sup |f(p)| < 2x)? If, - 
peR" 


Cc) 7 vanishes at infinity, i.e., _ 
lim f(p)=0. 
|ploo 


Proof The bound given in part b) is evident from the definition (10.1) of the Fourier 
transformation. The basic rules of Lebesgue integration imply that F is a linear map 
from L'(R") into L®(R"). In order to prove continuity of f at any point p € R", 
for any f € L'(IR"), consider any sequence of points p,; which converges to p. It 
follows that 


Jim f (px) = 22)? Jim, [ ee f(x)dx = Qn)? i eh fade = f(p) 


since eiPk-X F(x) > e iP * F(x) ask — co, foralmostallx € R” and |eiPe* F(x)| < 
| f(x)| for all x € R” and all k € N, so that Lebesgue’s theorem on dominated 
convergence implies the convergence of the integrals. The sequence test for continuity 
now proves continuity of f at p. Thus continuity of f follows. This proves parts a) 
and b). 

The proof of part c) is more involved. We start with the observation e~'* = —1 
and deduce, for all p € R”, p £ 0: 


(27)? f(p) = -{ oP FOx)dx = -f[ eins (: = 2) dx. 
R" - P 


Recall the definition of translation by a vector a of a function f, fa(x) = f(x — a) 
for all x € R”. Then with a = a for p € 0, we can write 


1 
(FIP) = SUF DD) — F fa Pla=4- 


hence, using linearity of F and the estimate of part b), it follows that 


1 n 
IF FMP) = GORY? 


f— fr 


1 


136 10 Fourier Transformation 


This shows that one can prove part c) by showing 
limIlf -falli=0 9 Vf LR"), 


ie., translations act continuously on L'(R”). This is a well-known result in the 
theory of Lebesgue integrals. In the Exercises one is asked to prove this result, first 
for continuous functions with compact support and then for general elements in 
L'(R"). This concludes the proof. 

In general, it is not so easy to calculate the Fourier transform if of a func- 
tion f in L'(R") explicitly. We give now a few examples where this calculation 
is straightforward. A more comprehensive list will follow at the end of this chapter. 


Example 10.1 


1. Denote by x,~a,a) the characteristic function of the symmetric interval [—a, a], 
that is, Xp—-aaj(x) = | for x € [—a,a] and x;~a4\(x) = 0 otherwise. Clearly this 
function is integrable and for Fx;_aq) we find, for any p € R\ {0}: 


Xteaa(P) = Qn)? fee" yg a\(a)dx = 20)? [7 etd 
= (2m)? 


—ipx 


ta _ 2. sinap 
4 Jin Pp 


Cre, 
—ip 


It is easy to see that the apparent singularity at p = 0 is removable. 
2. Consider the function f(x) = ee. f is certainly integrable and one has 
2 . . . 
he e* dx = ,/z. In order to calculate the Fourier transform of this function, we 


have to rely on Cauchy’s Integral Theorem 9.2 applied to the function z +> e* 
which is holomorphic on the complex plane C. 


Ll ~ TOO. = 4 . 2 tte =i 
(27)? f(@) = ; exe dy — er i eo tip) dx 
7 —0o 


(oe) 


aaa edz Cp ={z=x+ip:x €R} 
Cc 


se etd Co={z=x:xER} 


and thus we conclude that 


Fe" \(p)= ae 

e = —=e 

ae 

3. For some number a > 0 define the integrable function f(x) = eH! forx ER. 
Its L'-norm is || f ||, = 2. Its Fourier transform f can be calculated as follows, 


10.1 Fourier Transformation for Integrable Functions 137 


for all p € R: 


(2n)} f(p) = [t2 ePremabldy = [0 crip dy $ f° erp ard 


eax —ipx 0 eT aX=iPX 1490 = 1 1 
~ a-ip |-0o a —a—ip |0 ~~ a-ip ats at+ip* 
We rewrite this as ; 5 
— . a 
F(e*') (p) = 


Jin a? + p?” 


The following proposition collects a number of basic properties of the Fourier 
transformation. These properties say how the Fourier transformation acts on the 
translation, scaling, multiplication, and differentiation of functions. In addition, we 
learn that the Fourier transformation transforms a convolution product into an ordi- 
nary pointwise product. These properties are the starting point of the analysis of the 
Fourier transformation on the test function space S(R”) addressed in the next section 
and are deduced from the Riemann—Lebesgue lemma in a straightforward way. 


Proposition 10.1 


1. 


For f € L'(R")anda € R" the translation by a is definedas fy(x) = f (x—a) for 
almost all x € R". These translations and the multiplication by a corresponding 
exponential function are related under the Fourier transformation according to 
the following formulae: 

a) Fe fy(p)=(Ffyalp) VWpeR’, 

b) Ffal(py=eFfi(p) VpeR". 


. For any 4 > 0 define the scaled function f,, by f(x) = f(G) for almost all 


x € R". Then, for f € L'(R"), one has 


(Ff(p)=r"(F fp) VpeR’. 


. For all f,g € L'(R") one has f * g € L'(R") and 


F(f * g) = (20)? (Fg) - (Fg). 


. Suppose that f € L'(R") satisfies x; - f € L'(R") for some j € {1,2,... ,n}. 


Then the Fourier transform F f of f is continuously differentiable with respect 
to the variable p; and one has 


0 
ae AR ge): ONES R’. 
Pj 


. Suppose that f € L‘(IR") has a derivative with respect to the variable x; which 


is integrable, ples L'(R") for some j € {1,2,... ,n}. Then the following holds: 


: Ox] 


0 
(+) (p) = ip(F Ap) and |pFAP)| < 


VpeER". 
OX; P 


1 


a; 


138 10 Fourier Transformation 


Proof The proof of the first two properties is straightforward and is done in the 
Exercises. 

To prove the relation for the convolution product we apply Fubini’s theorem on 
the exchange of the order of integration to conclude || f * g|li < || fll, llgll, for all 
f.g € L'(R"), hence f * g € L'(R"). The same theorem and the first property 
justify the following calculations for all fixed p € R": 


(21)? Ff * 8)(P) = fqn OP "(Ff * g\(a)dx 
= fon (fan f& — y)g(y)dy) dx = fan (fan 0? * f(x — y)dx) gy)dy 
= (21)? fone? (F f\(p)g(y)dy = 22 )"(F f (pF gp). 
Now the third property follows easily. 
In order to prove differentiability of F f under the assumptions stated above, take 


any p € R” and denote by e; = (0,... ,0,1,0,... ,0) the standard unit vector in R” 
in coordinate direction j. By definition, for all h € R, h #4 0, we find 


flx)dx. 


cont Let he= Fee) _ eee ee 
h nh h 


For arbitrary but fixed x € R” we know 
e pthej)-x tas evi x 


lim =-—ixje'?™. 
h>0 h 


Furthermore, the estimate 
ep thej)-x = eo ip x 


h 


< |x; | Vx,p€R’ 


is well known. Thus, a standard application of Lebesgue’s theorem of dominated 
convergence implies, taking the hypothesis x; f € L'(R") into account, 


F F(p +he;) — f(p) 
im 
h->0 h 


= (2m)? i (—ix el? * f(x) dx, 
R” 


and we conclude 


of : i 

5 (P)=F(—ixj fp) Vp eR". 

dpj 
This partial derivative is continuous by the Riemann—Lebesgue lemma and thus the 
fourth property follows. 


10.1 Fourier Transformation for Integrable Functions 139 
In order to prove the fifth property, we start with the observation 
feLi®) ad f'eL'® = Jim, f(x) =0. 
This is shown in the Exercises. Now we calculate 


n 7 , n 0 : 
ipj(F fp) = Qn)? / ipje 'P* f(x)dx = —(Qx)? / ale) f (x)dx 
R Rn OX; 


and perform a partial integration with respect to x;. By the above observation the 
boundary terms vanish under our hypotheses and thus this partial integration yields 


(21)72 I. eins F nay =F (=) (Pp). 


Ox; Ox; 


We conclude by Lemma 10.1. 
Denote by C;(IR”) the space of all bounded continuous functions f : R’ > C 
which vanish at infinity as expressed in the Riemann—Lebesgue lemma. Then this 
lemma shows that the Fourier transformation F maps the space L!(R") into C,(R"). 
A natural and very important question is whether this map has an inverse and what 
this inverse is. In order to answer these questions some preparations are necessary. 


Lemma 10.2 
1. For all f,g € L'(R") the following identity holds: 


[ FeoFennar = fe Aoreonay. 


2. Suppose f,g € L'(IR") are continuous and bounded and their Fourier transforms 
F f, Fg belong to L'(R") too. Then one has 


FO) [ (Fayladde = 800) i (F Ayoydy. 


Proof If f,g € L'(R"), then the function (x,y) HK e7"” f(x)g(y) belongs to 
L'(R” x R") and thus Fubini’s theorem implies 


I, =(21)-? fon (fan O%? f(x) g(y)dy) dx = (27)? fon fron 0? F(x) g(y)dx dy 
= (21)~? fon (fon 0% f(x) g(y)dx) dy = b. 


According to the definition of the Fourier transformation, we have 


n= forFernd, n= f Nore ny, 


Thus, the identity 7; = J, proves the first part. 


140 10 Fourier Transformation 


Next apply the identity of the first part to f, g, € L'(R"), for g € L'(R”) and 
gi(y) = g({), A > O, to get 


[ feFenmar =f FAoreordy  va>o. 
The second part of Proposition 10.1 says (Fg,(x) = 4"(Fg)(Ax). This implies 
[, feo @eanar =f nore (Z)ay  ¥a>o. 
Now, we use the additional assumptions on /, g to determine the limit 4 — oo of 
this identity. Since f is continuous and bounded and since Fg € L'(R"), a sim- 


ple application of Lebesgue’s dominated convergence theorem proves, by changing 
variables, € = Ax, 


tim [ f(x)" (Fayax)dx = tim [ f @ (F g)(§)d& =) SON F 9) dé. 
A> 00 Jpn A> 00 JRn Xr R 


Similarly, the limit of the right-hand side is determined: 


A y 
tim [ (Epos (2) ay = i (F Aylag dy. 


Thus the identity of the second part follows. 
Theorem 10.1 (Inverse Fourier Transformation). 


1. On L'(R") define a map L by 
(L(x) = Oxy? [ e'f(pdp  ¥p eR". 


This map £L maps L'(R") into C,(R") and satisfies 
(LA\~)=(Ff-x) VxeER". (10.2) 


2. On the space of continuous bounded functions f such that f and F f belong to 
L'(R") one has 
LFf=f and FLf=f, 


hence on this space of functions, L is the inverse of the Fourier transformation F. 


Proof The proof of the first part is obvious. For the proof of the second part, we 
observe that for every x € R” the translated function f_, has the same properties as 
the function f and that the relation F(f_x) = e, - (Ff) holds where e, denotes the 
exponential function e,(p) = e'*’?. Now apply the second part of the Lemma to the 


10.2 Fourier Transformation on S(R") 141 


function f_, and any g € L'(IR") which is bounded and continuous and for which 
Fg belongs to L'(R") to obtain 


f-x(0) i (Fapdp = 810) [ (FFP Mp = 800) [ _ ec MF FP MAp, 


or, by taking f_,(0) = f(x) into account, 


f(x) [ (Fg)\(p)dp = g(0) ia e (FE f)\(p)dp = g(0)(27)? (LIF f))(x). 


Next choose a special function g which satisfies all our hypotheses and for which 
we can calculate the quantities involved explicitly: We choose for instance (x = 


(x1, Porky »Xn)) 
g(x) = I] ealrel a> 0. 
k=1 
In the Exercises, we show /(g) = | (Fg\p)dp = (27)? and thus we deduce 


f(x) = (LCF f))(x) for all x € R”. With the help of the first part the second identity 
follows easily: For all p € R” one has 


(FILS )\p) = Gx)? fare? (Lf \ax)dx 
= (2m)? far oP (Lf —x)dx = (CF f)\(p). 


10.2 Fourier Transformation on S(R”) 


As indicated earlier our goal in this chapter is to extend the definition and the study 
of the Fourier transformation on a suitable space of distributions. Certainly, this 
extension has to be done in such a way that it is compatible with the embedding of 
integrable functions into the space of distributions and the Fourier transformation 
on integrable functions we have studied in the previous section. From the Riemann— 
Lebesgue lemma it follows that f = Ff € L},.(R") whenever f € L'(R”). Thus, 
the regular distribution J is well defined. In the Exercises, we show 


(7,6) = Ip,Fb) Woee DR"). 


If F’ denotes the Fourier transformation on distributions we want to define, the 
compatibility with the embedding requires 


Fily= Ire Vf €L'(R"). 
Accordingly one should define ¥’ as follows: 


(F'T,¢) =(T,F¢) VoeeT(R"), VET (R’), (10.3) 


142 10 Fourier Transformation 


where 7 (IR") denotes the test function space of the distribution space 7’(R”) on 
which one can define the Fourier transformation naturally. 

In the Exercises, we show: If ¢ € D(R"), @ 4 0, then F¢ is an entire analytic 
function different from 0 and thus does not belong to D(R") so that the right-hand 
side of Eq. (10.3) is not defined in general in this case. We conclude that we cannot 
define the Fourier transformation F’ naturally on D’(R”). 

Equation (10.3) also indicates that the test function space 7 (IR”) should have the 
property that the Fourier transformation maps this space into itself and is continuous 
in order that this definition be effective. In this section, we will learn that this is the 
case for the test function space 7 (IR”) = S(R") and thus the space of tempered dis- 
tributions becomes the natural and effective distribution space on which one studies 
the Fourier transformation. 

Recall that the elements of the test function space S(R") of strongly decreasing 
C®-functions are characterized by condition (2.10). An equivalent way is to say: A 
function ¢ € C~(R”) belongs to S(R") if and only if 


Voen Veen" dc,.geR+ Veer» |x? D°O(x)| < Ca,p. (10.4) 


Recall furthermore that the topology on S(R”) is defined by the norms py» , m,/ = 
01,2,..., where 


Pma(@) = sup {(1 + x2)? |D*O@)|: x ER", Ja <i}. 
An easy consequence is the following invariance property of S(R”): 
¢ € S(R"), a, BEN" => x? DG € S(R") 
and 


P(x D*) < Pmn+|Bl+lo|(P)- (10.5) 


In the previous section, we learned that the Fourier transformation is invertible on 
a certain subspace of L'(R"). Here we are going to show that the test function 
space S(R”) is contained in this subspace. As a first step we observe that S(R”) is 
continuously embedded into L'(R”) by the identity map: 


S(R") c L'(R"), IPill < Crn+io(d) Vo € S(R"). (10.6) 


Here the embedding constant C depends only on the dimension n: 


C = dx 
R (1+ x2)" 


This is shown in the Exercises. 
Theorem 10.2 (Fourier Transformation on S(R")) 


1. The Fourier transformation F is an isomorphism on S(R"), i.e., a continuous 
bijective mapping with continuous inverse. 


10.2 Fourier Transformation on S(R") 143 


2. The inverse of F is the map L introduced in Eq. (10.2). 

3. The following relations hold for all @ € S(R"), p € R", anda € N": 
a) D*(Fo)(p) = F((—1x)* b)(p). 
b) F(D*o)(p) = (ip)*(F o)(p). 


Proof Ina first step we show that the Fourier Transformation F is a continuous 
linear map from S(R”) into S(R”). Take any ¢@ € S(R") and any a, 8 € N”. Then 
we know x? D’@ € S(R") and the combination of the estimates (10.5) and (10.6) 
implies 


|x? D° A], < Crn+1+1p),1a\()- (10.7) 


Hence, parts 4) and 5) of Proposition 10.1 can be applied repeatedly, to every order, 
and thus it follows that 


D°(F op) = F(-ix)"o)(p) = VYpeR", VaeN’. 


We deduce F¢ € C™(R”) and relation a) of part 3) holds. 
Similarly one shows for all a, 8 € N” and all p € R", 


p’ D*(Fo\(p) = p?F((-ix)*)(p) = F(-iDY U(—ix)"o)(p). (10.8) 


Choosing a = 0 in Eq. (10.8) implies relation (b) of part 3). Equation (10.8) also 
implies 

|p? D*(F$)(p)| < || D°(«*9)||, 
and therefore by estimate (10.7), for all m,/ = 0,1,2,... andall @ € S(R"), 


Pm(Fo) < CPn+i+im(@), (10.9) 


where the constant C depends only on m,n,/. This estimate implies F¢ € S(R”). It 
follows easily that F is linear. Hence, this estimate also implies that F is bounded 
and thus continuous. 

Since we know (£L¢)(p) = (F ¢)(— p) on S(R"), it follows that the map CL has the 
same properties as F. The estimate above shows in addition that S(R”) is contained 
in the subspace of L'(R") on which F is invertible. We conclude that the continuous 
linear map £ on S(R”) is the inverse of the Fourier transformation on this space. 
This concludes the proof of the theorem. 

On the test function space S(IR”) we have introduced two products, the standard 
pointwise product and the convolution product. As one would expect on the basis 
of part 3) of Proposition 10.1 the Fourier transformation transforms the convolution 
product into the pointwise product and conversely. More precisely we have the 
following. 


144 10 Fourier Transformation 


Corollary 10.1 


1. The Fourier transformation F and its inverse L are related on S(R") as follows, 
u € S(R"): 


Lu=Fu= (Fu) CFRu=FLu=u FFu=in=LLu, 


where u(x) = u(—x) for all x € R". 
2. Forall ¢,w € S(R") the following relations hold: 


F(p * p) = (21)? (FO) - (Fw), 
Fp) = Qn) 2 (FO) * (Fy). 


Proof The first identity in the first part is immediate from the definitions of the maps 
involved. The second repeats the fact that £ is the inverse of F, on S(R"). The third 
identity is a straightforward consequence of the first two. 

In order to prove the second part, recall that by part 3) of Proposition 10.1 the first 
identity is known for functions in L'(R"), and we know that S(R") is continuously 
embedded into L'(R”). Furthermore, we know from Proposition 7.3 that @ * w € 
S(R"). This proves that the first identity is actually an identity in S(R”) and not only 
in L1(R"). 

Now replace in the first identity of the second part the function @ with C@ and 
the function w with Ly to obtain F((L¢d) « (LW)) = (21)? (F(LQ)) « (F(LW)) = 
(21)? - w. It follows that F(@- w) = (21)~2 F(F((LO) * (Ly))) and thus, tak- 
ing the first part into account = (277)~2((Ld) * (Ly)) = (22) 2(L¢) * (Ly) = 
(21 )~2(F oh) « (Fw), hence F(@- w) = (22) 2 Fb * Fw. 


10.3. Fourier Transformation for Tempered Distributions 


According to the previous section the Fourier transformation is an isomorphism of 
the test function space S(R"), hence it can be extended to the space of tempered 
distributions S’(R”) by the standard duality method. After the formal definition has 
been given we look at some simple examples to illustrate how this definition works 
in practice. Then several important general results about the Fourier transformation 
on S’(R") are discussed. 


Definition 10.1 The Fourier transform T = F'T of a tempered distribution T € 
S’(R") is defined by the relation 


(F'T,¢) =(T,Fb) Vd € S(R"). (10.10) 


10.3 Fourier Transformation for Tempered Distributions 145 


Example 10.2 


1. Dirac’s delta distribution is obviously tempered and thus it has a Fourier transform 
according to the definition given above. The actual calculation is very simple: For 
all @ € S(R") one has 


(F'5, 6) = (8, Fb) = (Fo)(0) = (22)? 1 o(x)dx = (21)-2 (hi, $), 
hence 
F'§= (Qn) th, 


1.e., the Fourier transform of Dirac’s delta distribution is the constant distribution. 
This is often written as F’5 = (277)~?. 

2. Next we calculate the Fourier transform of a constant distribution [,, c € C. 
According to the previous example we expect it to be proportional to Dirac’s 
delta distribution. Indeed one finds for all @ € SCR"), 


(Fle) = Ucs Fb) = fan (FO) p)dp = c(27)? (LF O)(0) 
= c(2m)? PO) = (c(27)26, 6), 


1.€., 
F'l, = c(2r)26. 


3. Another simple example of a tempered distribution is the Heaviside function 60. 
It certainly has no Fourier transform in the classical sense. We determine here its 
Fourier transform in the sense of tempered distributions. The calculations contain 
a new element, namely a suitable limit procedure. For all @ € S(R) we find 


F'0,$) = (0,F$) = [ (Fo)(p)dp = lim i e'(Fo)(p)dp. 


For fixed r > 0 we apply Fubini’s theorem to exchange the order of integration 
so that one of the integrals can be calculated explicitly. The result is 


fe e'(Fo\(p)dp = i e-’? (co cP *b(x) 5) dp, 
ie (x) a e-P'P* dp) ae = (20)~? te —_o(x)dx, 


hence 


: . 1 i 1 
POG = as Jon x —ir J2n x — io” 
By duality the properties of the Fourier transformation on S(R”) as expressed in 
Theorem 10.2 are easily translated into similar properties of the Fourier transforma- 
tion on the space of tempered distributions S’(R"). 


Theorem 10.3 (Fourier Transformation on S'(R")) 


1. The Fourier transformation F' is an isomorphism of S'(R"). It is compatible with 
the embedding of integrable functions: For all f € L'(R") we have 


F'l; = Irp. 


146 10 Fourier Transformation 


2. The inverse of F' is the dual L’ of the inverse L of F, i.e, F’"' = L'. 
3. The following rules hold, a € N": 


F'(DET)(p) = (ipy\(F'T )(p), dD (FT )\(p) = F'((—ix)*T)). 


Proof Inthe Exercises, we show: if J is an isomorphism of the HLCTVS E, then its 
dual /’ is an isomorphism of the topological dual space E’ equipped with the topology 
of pointwise convergence (weak topology o). Thus, we deduce from Theorem 10.2 
that F’ is an isomorphism of S’(R"). 

Next consider any f € L'(R”). We know that its Fourier transform Ff is a 
bounded continuous and thus locally integrable function which defines the tempered 
distribution J. For all ¢ € S(R”) a simple application of Fubini’s theorem shows 


that 
de 7 dx 
(21)? 


(F'T 5,0) = (Ip, Fb) = Jan FON FO) ax) dx = Jpn fr) (Je e*? b(p) 
= fen (Ju flxyeri?* ity) o(P)dp = fon (F f\(p)o(p)dp = (Urs, )- 


This implies compatibility of the Fourier transformations on L!(R") and on S’(R") 
and thus part 1) has been shown. 

In order to prove part 2) take any T € S‘(IR") and calculate for all @ € S(R") using 
Theorem 10.2 (L’F'T, ) = (F'T, LO) = (T, FLO) = (T,), thus L’F’ = id. It 
follows that £’ is the inverse of F’. 

Finally, we establish the rules of part 3) relying on the corresponding rules as 
stated in Theorem 10.2: Take any T € S’(R”) and any @ € S(R") and use the 
definitions, respectively, the established rules, to get 


(F'(DET), 6) = (DET, Fo) = (-VI*\(T, DEF ¢)) 
= (—1)"\(T, F((—ip)*9)) = (F'T, ip") = (ip)*FT, 4). 


Since this identity holds for every @ € S(R") the first relation is proven. Similarly 
we proceed with the second. 


(D§(F'T), 6) = (—D#( FT, Dog) = (—1)!\(T, F(D34)) 
= (-DE\(T, (ix)" Fo) = ((-ix)*T, Fo) = (F'((—ix)*T), 9). 


As a simple illustration of the rules in part 3) we mention the following. Apply the 


first rule to Dirac’s delta distribution. Recalling the relation F’5 = (277) 2 we get 
F'(D"8)(p) = (20)? (ip)*. (10.11) 


Similarly, applying the second rule to the constant distribution T = J, produces the 
relation 


F'(—ix))(p) = (21)? D3(p). (10.12) 


10.3 Fourier Transformation for Tempered Distributions 147 


Certainly, these convenient rules have no counterpart in the classical theory of 
Fourier transformation. Further applications are discussed in the Exercises. 

In Corollary 10.1, we learned that the Fourier transformation F transforms a 
convolution of test functions ¢, yw into a pointwise product: F(¢* py) = (27 )2(F op): 
(Fy). Since we have also learned that the convolution and the pointwise product of 
distributions is naturally defined only in special cases, we cannot expect this relation 
to hold for distributions in general. However, there is an important class for which one 
can show this relation to hold for distributions too: One distribution is tempered and 
the other has a compact support. As preparation we show that the Fourier transform 
of a distribution of compact support is a multiplier for tempered distributions, i.e., a 
C-function with polynomially bounded derivatives. 

To begin we note 
en ipx 
(Qn)? . 
Suppose T € D’(R") is a distribution with support contained in the compact set 
K CR’. For any function u € D(R") define a function T, : R" > C by 


Lemma 10.3 For p € R" define a function e, : R" > C by e,(x) = 


Llp) = (Tye, +) VpeR". 


Then the following holds. 


1. Ty € On CR"), i.e. T,, is a C%-function with polynomially bounded derivatives. 
2. Ifu,v € D(R") satisfy u(x) = v(x) = 1 forall x € K, then T, = T,,. 


Proof Since for each p € R” the function e, - u belongs to D(R") if u does, the 
function T,, is well defined for u € DCR"). As in Theorem 7.1 it follows that T,, is a 
C%-function and 


D°T,(p) = (T, Do(ep -Uu)) = (T, ep - (—ix)* - u) VaéeN’. 


Since T has its support in the compact set K, there are m € N and aconstant C such 
that |(7,)| < Cpxm(@) for all ¢ € Dx (R"). It follows that, for all p € R’, 


|D°T,(p)| < CPKm(ep « (—ix)* - u). (10.13) 


As we show in the Exercises, the right-hand side of this inequality is a polynomially 
bounded function of p € R”. It follows that T, € O,,(R”). This proves the first part. 

If two functions u,v € DCR”) are equal to 1 on K, then, for every p € R”, the 
function e, - (u — v) vanishes on a neighborhood of the support of the distribution T 
and hence (T, é, - (u — v)) = 0. Linearity of T implies 7, = T,. 


Theorem 10.4 A distribution T € D'(R") of compact support is tempered and its 
Fourier transform T = F'(T) is aC -function such that all derivatives D“T (p) are 
polynomially bounded, i.e., T € Om(R") (see also Proposition 4.3). 


Proof A distribution T with compact support is an element of the dual €’(R”) of 
the test function space €(R”), according to Theorem 3.6. Since S(R”) C E(R") with 


148 10 Fourier Transformation 


continuous identity map, it follows that €’(R”) C S’(R"). Therefore, a distribution 
with compact support is tempered and thus has a well-defined Fourier transform. 

Suppose T € D’(R”) has its support in the compact set K C R”. Choose any 
u € D(R") with the property u(x) = 1 for all x € K and define the function T, as in 
the previous lemma. It follows that T,, € O,,(R”) and we claim 


FT =I,,, 


ie., for all @ € S(R"), 


(FT.0) = | Tp e(orar. 
R” 
According to the specification of u we know 
(u-T, Fo) = (T,F¢) = (F'T, 4). 


Now observe F¢(x) = fon ep(x)b(p)dp and thus 


(u-T, Fo) = ((U-T)X), fon ep()P(p)dp) = (T(x), ux) fan p(x)b(p)dp) 
= fon(T (x), ux)ep(x)) P(P)dP = fron Tu(P)P(P)dp- 


In the second but last step we used Eq. (7.6). This gives (F'T, ¢) = = T,(p)o(p)dp 
for all ¢ € S(R") and thus proves F’T = T,,. The previous lemma now gives the 
conclusion. 

As further preparation we present a result which is also of considerable interest 
in itself since it controls the convolution of distributions, in S’(R”) and in €’(R”), 
with test functions in S(R"). 


Proposition 10.2 | The convolution of a tempered distribution T € S'(R") with a 
test function w € S(R") is a tempered distribution T * yy which has the Fourier 
transform 

F(T * yw) = 20)2(F'T)- (Fy). 
In particular, if T € E'(R"), then T «x w € S(R"). 


Proof The convolution T * yw is defined by 


(Txw.b)=(T,v*o) Vb S(R"). 


Since we have learned that, for fixed y € S(R"), 6 B W * @ is a continuous linear 
map from S(R") into itself (see Proposition 7.3) it follows that T « yw is well defined 
as a tempered distribution. For its Fourier transform we find, using Corollary 10.1 


10.3 Fourier Transformation for Tempered Distributions 149 


and Theorem 10.3, 


(F(T *W),6) =(T*v, Fo) = (7.0 *¢) 
= (F'T, L(y * Fo)) = 20)32 (F'T, LOW) - (LF O)) 
= (21)2(F'T, (Fw) - 6) = (20)? (Fv): (F'T), 9). 
This implies 
F(T kW) = (27)2(F'T) (Fw) VTE S'(R"), Vw € S(R’). 


IfT € €’(R"), then F’T € O,,(R") by Theorem 10.4 and thus (F’T)-(F) € S(R"), 
hence F(T * w) = F(T * wv) € S(R"). 


Theorem 10.5 (Convolution Theorem) The convolution T « S of a tempered distri- 
bution T € S'(R") and a compactly supported distribution S € E'(R") is a tempered 
distribution whose Fourier transform is 


F(T * S) = (2n)3(F'T) - (F'S). (10.14) 


Proof Since S € €’(R") Proposition 10.2 ensures that 


xr> (S(y), b+ y= (S * bx) 


belongs to S(R") for every ¢ € S(R"). Using Corollary 10.1 we calculate its inverse 
Fourier transform: 


L(S * 6) = (20)3(F'S) - (LO) 


with F’S € O,,(R") according to Theorem 10.4. Observe now that the definition of 
the convolution of two distributions can be rewritten as (JT * S,@) = (T, Sx ¢), for 
all @ € S(R"). Hence, T « S is a well-defined tempered distribution. 

The inverse of F’ is £’. This implies 


(T *S,6) =(F'T, LIS * b)) = Qn)? (F'T, (FS) - LO) 
= (2m)? ((F'S) - (FT), Ld) 


and therefore T x S = (27)2?L'(F'S) - (FT)). Now Eq. (10.14) follows and we 
conclude. 

Naturally one would like to extend the above convolution theorem to the case of 
two tempered distributions, both not having a compact support. Then, in addition to 
the problem of the existence of the convolution as a tempered distribution, one has 
to solve the problem of multiplication of two tempered distributions. In analogy to 
what we learned about the convolution in the space D (R) of distributions on R with 
support in [0, 00) we formulate the following result which has many applications, in 
particular in mathematical physics. 


150 10 Fourier Transformation 


Theorem 10.6 (Generalized Convolution Theorem) Let Cc R"” be a closed 
convex cone such that x - y > 0 for all x, y € and denote by S'.(R") the space of 
all tempered distributions on R" with support in I. Then the following holds. 


1. For all T,S € S}.(R") the convolution product T « S is a well-defined element in 
SiR"), 

2. The product (FT )-(F'S) is well defined as a distribution in S'(R") by the Formula 
(10.14) 


(F'T) -(F'S) = (20) 2 F(T * S). (10.15) 
Proof According to Corollary 5.2, given T,S € S}-(R"), there are continuous 
functions t,s with support in I”, satisfying f |f(x)|(1 + x2)-™/?dx = Cy) < ©, 
respectively, f |s(x)|(1 + x?)-*/*dx = Cy < oo for some m,k € N, and some 


multi-indices a, B such that T = (—1)'*! D® t, respectively, S = (—1)!8!D8 s. 
Now for all ¢ € S(R") we estimate as follows using DD’ ¢ = ¢°* 


\(T ® S(x,y), o(« + y))| = | i / t(x)s(y)6°*?(x + y)dx ay 
CJ 
= | / / tx yh + x?) s(y)L + y?) #7 + x2"? (VL + y? KP OOTP + y)dx ay 
CJ 


= i It(xyI + x? "dx / Isl + y?) dy 
oi r 


XSUPe yerxr(l + x7)" + yh? |p + y)L. 
Since ¢ belongs to S(R”) we know 


Ib(x + YS pm s(P + @ + yyPy"P 
for! = |a+ 6| and M = 1,2,.... The assumed properties of the cone I” imply that 
for (x,y) € 7 x I one has 14+. x? < 1+(+y) and1+ y? <1+(*+y) and 
therefore on x I" 
(Gay)? sta) ""dty yy. 


Now choose M > 2max {m,k} so that m/2 < M/4 and k/2 < M/4 and the above 
supremum over J” x I” is finite and estimated by Cpyj(@) with a suitable constant 
C. We conclude 


I(T @ S(x,y), (P(x + y))| < CCiCrpmi) 


for all @ € SCR”) and thus the convolution T * S is well defined as a tempered 
distribution by the standard formula 


T « S(p) = (T @ SQ, y), Pa + y)). 


This formula also shows that T * S has its support in J”. Since the Fourier transform 
is an isomorphism of the space of tempered distributions the second statement 
follows immediately from the first. 


10.3 Fourier Transformation for Tempered Distributions 151 


Remark 10.2 


1. If a cone I” satisfies our hypothesis, then the cone —I satisfies this condition 
too. The light cone V+ = {(xo,x) € R x R’~! : xo > |x|} of physics certainly 
satisfies our condition. 

2. Given distributions T, S € S}.(R”) one can approximate S for instance by dis- 
tributions Sz = xrS of compact support by using a sequence of smooth cut-off 
functions xr € DCR”) such that xr(x) = | for all x with |x| < R. The convolu- 
tion theorem applies to the pair (7, Sz). The proof of the generalized convolution 
theorem can be extended to show that the limit 


lim T « Sp 

R->oo 
exists in S}.(IR”). Since the Fourier transform ¥’ of tempered distributions is 
continuous it follows that the limit 


lim F’(T)- F’(Sr) 
Roo 


exists in S’(R"). 


We started the study of the Fourier transformation on the space L'(IR”). We found 
that the domain and the range of F are not symmetric. However, when we restricted 
F to the test function space S(R") we could prove that the domain and the range are 
the same; actually we found that F is an isomorphism of topological vector spaces 
and used this to extend the definition of the Fourier transformation to the space 
of all tempered distributions S’(R"), using duality. Certainly, the space L'(IR”) is 
contained in S’(R”), in the sense of the embedding L'(R”) 3 fr I 7 € &(R"). In 
this sense there are many other function spaces contained in S’(R"), for instance the 
space L?(R”) of (equivalence classes of) square integrable functions which is known 
to be a Hilbert space with inner product 


‘aye a FWewdr  VF.g LR, 


This is discussed in Sect. 14.1. There we also learn that the test function space S(R”) is 
dense in L?(R”). Since L7(IR”) is ‘contained’ in S’(R"), the restriction of the Fourier 
transformation F’ to L?(R") gives a definition of the Fourier transformation on 
L?(R"). More precisely this means the following: Denote the Fourier transformation 
on L?(IR") by Fy; it is defined by the identity 


Filp=Ipy Wf € LR"). 


In order to get a more concrete representation of F, and to study some of its properties 
we use our results on the Fourier transform on S(IR”) and combine them with Hilbert 
space methods as developed in part II. 


152 10 Fourier Transformation 


To begin we show that the restriction of the inner product of L*(R”) to S(R") is 
invariant under ¥. First we observe that for all ¢, w € S(R”) one has 


(Fo, Fw)2 = (Fo: Fy). 


Express the complex conjugate of the Fourier transform of ¢ as F@ = L(d) = Fi 6) 
and apply Corollary 10.1 to get Fo: Fy = (21) 2 F(@ * W). It follows that, using 
F'ly = (20)25, (Fb, Fy)2 = (h, Qn)? FG * W)) = (Qn)? Fh, gb * ) = 
(6, oxy) — (ox w)(0) = shia d(x) (x)dx = (¢, W)2, and thus we get the announced 


invariance 


(Fo,Fw)2=(¢v)2 Vow e SR’). 
This nearly proves 


Theorem 10.7 (Plancherel) The Fourier transformation Fy on L?(R") can be ob- 
tained as follows: Given any f € L?(R") choose a sequence (uj) jen in S(R") which 
converges to f (in L?(R")). Then the sequence (Fuj) jen is a Cauchy sequence in 
L?(IR") which thus converges to some element g € L*(R") which defines Fr f, i.e., 


Fof = lim Fuj. 
Irmo 


Fy is a well-defined unitary map of the Hilbert space L?(R"). 


Proof Since we know that S(R") is dense in L?(IR”) and that the inner product (-, -)2 
is invariant under the Fourier transformation F on S(R"), this follows easily from 
Proposition 23.3. 

The relation of the Fourier transformation on the various spaces can be summa- 
rized by the following diagram: 


PR") 2s PR") 


id id 


F(R") —2 + Y(R") 


All maps in the diagram are continuous and linear. F2 is unitary. 


Remark 10.3 The fact that the Fourier transformation F, is a unitary map of the 
Hilbert space L?(R") is of particular importance to the quantum mechanics of local- 
ized systems since it allows us to pass from the coordinate representation L7(R2) of 
the state space to the momentum representation L’(R") without changing expectation 
values. 


10.4 Some Applications 153 


Corollary 10.2 (Fourier Uncertainty Principle) For f € L7(R) with xf € L?(R) 
and f' € L?(R) one has 


1 
sll fila < pF flloixf lle (10.16) 


—a?x? 


with equality if f is a Gaussian, i.e., f(x) = ce with some nonzero constants 


a,c. 


Proof Write | f(x)| = /1+ x?|f(x)| x 1/1 + x? and apply the Cauchy—Schwarz 
inequality to conclude that xf € L?(R") implies f ¢ L'(R"). Thus, Fo(f) = F(f) 
is a continuous function which vanishes at infinity (Riemann—Lebesgue Lemma). 
Similarly, the relation pF¥(f)(p) = —iF(f)(p) and the assumption f’ € L?(R") 
imply that pF (f)(p) is a continuous function which vanishes at infinity. It follows 
that the boundary term generated by partial integration in the following integral 


vanishes and we get, using the abbreviation D, = tb 


[ pope rmrrap = — [ Ameen. 


An elementary calculation shows 


/ PDp|F (fp) dp = 2(pF(f), Dp(FF)))2 
R 
and thus by Plancherel’s Theorem 


Ills = IFAS = —28 (PF), Dp(F(P)))2 - 


The right-hand side is estimated by || pF (f)|l2|| Dp(F(f))|l2 where we used Cauchy— 
Schwarz inequality, and therefore, since D,(F(f)) = F(—ixf) we get all together, 
using again Plancherels Theorem 


WF < 2IPFAllolDp( FP lo = IPF Aallxf lla 
which is (10.16). 


10.4 Some Applications 


This section deals with several aspects of the solution theory for linear partial differ- 
ential operators with constant coefficients in the framework of tempered distributions, 
which arise from the fact that for tempered distributions the Fourier transformation 
is available. The results will be considerably stronger. 

Central to the solution theory for linear partial differential operators with con- 
stant coefficients in the space of tempered distributions is the following result by L. 
Hormander, see reference [7]. 


154 10 Fourier Transformation 


Theorem 10.8 (L. Hérmander) Suppose P is a polynomial in n variables with 
complex coefficients, P # 0. Then the following holds: 


a) For every T € S’(R") there is an S € S'(IR") such that 
P-S=T. 


b) If the polynomial P has no real roots, then the equation P - S = T has exactly 
one solution S. 


The proof of this core result is far beyond the scope of our elementary introduction, 
and we have to refer to the book [7]. But we would like to give a few comments 
indicating the difficulties involved. 

Introduce the set of roots or zeros of the polynomial: 


N(P) = {x € R": P(x) =0}. 


If the polynomial P has no real roots, then it is easy to see that + belongs to the 
multiplier space O,,,(R”) of tempered distributions and thus the equation P - S = T 
has the unique solution S = < -T. 

But we know that in general N(P) is not empty. In the case of one variable N(P) 
is a discrete set (see the fundamental theorem of algebra, Corollary 9.4). Forn > 2 
the set of roots of a polynomial can be a fairly complicated set embedded in R”; in 
some cases it is a differentiable manifold of various dimensions, in other cases it is 
more complicated than a differentiable manifold. In the Exercises we consider some 
examples. 

On the set R”\N(P) the solution S has to be of the form + - T, in some way. But 
+ can fail to be locally integrable. Accordingly the problem is: define a distribution 


P 
[$] € S’(R") with the properties 
1 
P-j=/=], 
P 
and the product of the two tempered distributions 
is a well-defined tempered distribution. 
As an illustration we look at the simplest nontrivial case, i.e.,n = 1 and P(x) = x. 
In the section on the convergence of sequences of distributions we have already 


encountered tempered distributions [3] which satisfy x - [3] = 1, namely the 
distributions 


x+ io 
Then, given T € S’(R), it is not clear whether we can multiply T with these 
distributions. H6rmander’s theorem resolves this problem. 


10.4 Some Applications 155 


Naturally, in the general case where the structure of the set of roots of P is much 
more complicated these two steps are much more involved. There are a number of 
important consequences of Hérmander’s theorem. 


Corollary 10.3 Suppose that P(D) = ie dqyD*, dy € C is a constant 
coefficient partial differential operator, P # 0. Then the following holds. 


a) P(D) has a tempered elementary solution Ep € S'(R") 

b) If P(x) has no real roots, then there is exactly one tempered elementary 
solution E p 

c) For every T € S'(R") there is an S € S'(R") such that 


P(D)S =T, 


ie., every linear partial differential equation with constant coefficients P(D)S = 
T, T € S‘(R"), has at least one tempered solution 


Proof We discuss only the easy part of the proof. For S € S’(R") we calculate first 
F'(P(D)S) = > ay F'(D*S) = ) > aglip)*F'S, 


la|<N la|<N 


where in the last step we used the third part of Theorem 10.3. This implies: given T € 
S'(RY), adistribution S € S’(R") solves the partial differential equation P(D)S = T 
if, and only if, S = F'S solves the algebraic equation 


P(ip)S =T 


with T = F’'T. Now recall #6 = (Q07)-2h. According to Theorem 10.8 there is 
lpim! € S'(R") such that PUp\ ap! = (27)~2], and zim! is unique if P(ip) 
has no real roots. By applying the inverse Fourier transformation we deduce that a 
(exactly one) tempered elementary solution 


; 1 
Pe En 


exists. This proves parts a) and b). For the proof of the third part we have to refer 
to Hérmander. In many cases one can find a tempered elementary solution E'p such 
that the convolution product Ep * T exists. Then a solution is 


S= Ep xT. 


As we know this is certainly the case if T has compact support. 


10.4.1 Examples of Tempered Elementary Solutions 


For several simple partial differential operators with constant coefficients, which 
play an important role in physics, we calculate the tempered elementary solution 
explicitly. 


156 10 Fourier Transformation 
10.4.1.1_ The Laplace Operator A; in R* 


A fundamental solution £3 for the Laplace operator A3 satisfies the equation A3E3 = 
5. By taking the Fourier transform of this equation — pF’ E3 = F'5 = (22) 7h, 
we find 


3-1 
FEp)=Qnt—, P= Pit Ph + Ph. 


Since p b> = is locally integrable on R*, ¥’E3 is a regular distribution and its 


inverse Fourier transform can be calculated explicitly. For @ € S(IR*) we proceed as 
follows: 


(E3, 6) = (F'E3,L) = aay Ie 5 + (ps e'? *b(x)dx) dp 
= agplimrsce Since 7 (In e'? *p(x)dx) dp 
= aylime+co Sas (age on oe) p(x)dx 
= Gylimroo Ses (if 2m fy sing prdp d0) b(x)dx 
= ap lima fe ( if se at ay) blx)dx 
= cat Jo (Jo aa) Har. 


The exchange of the order of ante is justified by Fubini’s theorem. Recalling 


the integral 
OO git _ go ih © sink 
——da = 2 =a OS = ge = 
0 iA 2 


OO | ait 
(E3,06) = es - far 1.€., a eae 


we thus get 


10.4.1.2 Helmholtz’ Differential Operator A3 — ,.” 


Again, by Fourier transformation the partial differential equation for the fundamental 
solution Ey of this operator is transformed into an algebraic equation for the Fourier 
transform: (A; — A)Ey = 6 implies (—p? — A)E(p) = (27)7?2 with E = F’Eg. 
Hence for 4 = yx” > O one finds that Pip) = —(p? + 1”) has no real roots and thus 
the division problem has a simple unique solution 


oe eee ; : ~ € LLB). 
(21)? Po + 
The unique (tempered) fundamental solution of Helmholtz’ operator thus is, for all 
x € R*\ {0}, 
a —] eip-x —Je THI 
BHOSLE) =p [Lz ae ee 


The details of this calculation are given in the Exercises. 


10.4 Some Applications 157 


10.4.1.3 The Wave Operator O, in R* 


In Proposition 8.5, it was shown that the distribution 
ee ore 
E,(x0,X) = ——0(%0)8(x§ — x?) 
20 


is an elementary solution of the wave operator. Here we illustrate the use of the 
Fourier transformation to prove this fact, in the case of a partial differential operator 
with more than one elementary solution. 

It is a simple calculation to show that the assignment 


1 a 
sR) s or = [ sin =o) 
TT JR 


2|x| 

defines a tempered distribution on R*. For any ¢ € S(R*), we calculate 
; 1 ax 

(F'E,,6) = (E,, Fo) = =— | (Fox > 

20 JR3 2|x| 


and observe that this integral equals 


1. ae Ce . 5, 3 
srlim | eo" (F o)(\x|,x)—— =lim | I:(po. P)P(Po, P)dpod p, 
Ar 110 Jes Ix] 10 Jipa 
where for t > 0 
: or 5 
I;(po, P) = one tes ew bel(po—it)—ip-x aa 


= 1 1 eoeee| =i 
“~~ (2x)? 2\p| (= p|-it ae) ~~ (2x)? (po-it)—|pl? * 


Since an elementary solution EF of Oy satisfies O,E = (5 — A3)E = 6 where 
00 = ae its Fourier transform E satisfies 


(—p> + p*)E(po, p) = 2a)? h. 


The polynomial P(po, p) — - Pe + p? vanishes on the cone 
{(Po. pe Rt: po= +|p|} and therefore, by Hérmander’s results (Corollary 
10.4.1) one expects that the wave operator has more than one elementary solution 
according to the different possibilities to define < as a tempered distribution. 
Standard choices are 


=f 1 1 1 
l= = lim oe 5 = lim ( - - ) 
P|, 0 (potit)? — p? = 0 2|p| \(po £it)+ |p| (po Hit) — |p| 
(10.17) 


158 10 Fourier Transformation 


And with the above expression for I;(po, p) we find that E,. corresponds to the choice 
of the minus sign in (10.17), since 


((—p3 + p?)F'E;,o) = (F'E;, (— pe + p”)) 

= lim,yo fes L:(Po. P\(—P6 + P?)P(Do, P)d pod? p 

= (21) fa (Po, P)d pod’ p = (2) *(, b) = (F'5, ¢), 
and thus indeed 0,£E,. = 6. 


10.4.1.4 Operator of Heat Conduction, Heat Equation 


Suppose E(t, x) € S’(R"*') satisfies the partial differential equation 


(0; a An)E = 6, 
where 0; = 2, i.e., FE is an elementary solution of the differential operator of heat 
conduction. Per Fourier transform one obtains the algebraic equation 


(ipo + p°(F'E\(po, p) =n) th = po ER, peR’. 


Since me € Lj,-(R"*'), the solution of this equation is the regular distribution 


given by the function 
~ _ntl = 
E(po, p) = Qr)-"? (ipo + p?)". 
Now consider the function 
n x2 
E(t,x)=0(t)(4t) 7e # teR, xeER". 


Its Fourier transform is easily calculated (in the sense of functions). 


(FE\(po, p) = Qny-"F (Ary-2 [> fon Cie iPS eo W dxdt 
= (Any "Fry 3 fo? e-irote- 3 Artybe dt 
= (2x) "F 1 


ipo+p2* 
We conclude that E is a tempered elementary solution of the operator 0; — Ay. 


10.4.1.5 Free Schrédinger Operator in R” 


The partial differential operator 

a) 

‘ot = An 
is called the free Schrodinger operator of dimension n. In the Exercises it is shown 
that the function e 

Es(t,x) = 0(t)(4rit)2e7 

defines a tempered distribution which solves the equation az —A,)Es = 6 in 
S’(R"*!*) and therefore it is a tempered elementary solution. 


10.4 Some Applications 159 


Other examples of elementary solutions and Green functions are given in the 
book [2]. 


10.4.1.6 Some Comments 


There is an important difference in the behaviour of solutions of the heat equation 
and the wave equation: The propagation speed of solutions of the wave equation 
is finite and is determined by the ‘speed parameter’ in this equation. However, the 
propagation speed of heat according to the heat equation is infinite! Certainly this 
is physically not realistic. Nevertheless, the formula u(t,x) = E * Uo (E is the 
elementary solution given above) implies that an initial heat source Up localized in 
the neighbourhood of some point xo will cause an effect u(t, x) A O ata point x which 
is at an arbitrary distance from xo, within a time t > 0 which is arbitrarily small. 


10.4.2. Summary of Properties of the Fourier Transformation 


Ina short table we summarize the basic properties and some important relations for the 
Fourier transformation. Following the physicists convention, we denote the variables 
for the functions in the domain of the Fourier transformation by x and the variables for 
functions in the range of the Fourier transformation by p. Though all statements have 
a counterpart in the general case, we present the one-dimensional case in our table. 

For a function f we denote by f its Fourier transform F f. In the table we use 
the words ‘strongly decreasing’ to express that a function f satisfies the condition 
defined by Eq 2.10. 

As a summary of the table one can mention the following rule of thumb: If f or 
T € S'(R) decays sufficiently rapidly at “infinity,” then f , respectively T, is smooth, 
i.e., is a differentiable function, and conversely. 

In the literature there are a good number of books giving detailed tables where the 
Fourier transforms of explicitly given functions are calculated. We mention the book 
by F. Oberhettinger, entitled “Fourier transforms of distributions and their inverses : 
a collection of tables,’ Academic Press, New York, 1973. 


160 


10 Fourier Transformation 


Properties of Fourier transformation + 


Properties in x-space Properties in p-space 
decay for |x| — 0 local regularity 
1) feL'(R) 1 fe@¢(R) and lim),|-...f(p) =0 
2) f strongly decreasing 2) fe€-(R)NL'(R) 
3) fe-Y(Ry) 3) fE7(Ry) 
4) f(x) =e" ,a>0 4) F(p)=e# 
5) f€L'(R), suppf C [—a,a],a>0 |5) f analytic on C 
bounded by const e“l?| 
6) T€./'(R), suppT C [—a,a],a >0]6) T analytic on C, bounded 
by O(p)e“'”!, O polynomial 
7) TES (R) 1) TES (R) 
growth for |x| — 0 local singularity 
8) Lew 8) 8(p), 6(p—a) 
9) x” 9) 8(™(p) 
10) multiplication with (ix)” 10) differential operator CaM 
11) (+x) 11) FEST 
12) signx 12) vp . 
13) A(x)x"—! 13) eae 
10.5 Exercises 
1. For f € L'(R") show: 
lf -— falli ~ 0 as a—0. 


Hints: Consider first continuous functions of compact support. Then approxi- 


mate elements 


of L'(R") accordingly. 


2. Prove the first two properties of the Fourier transformation mentioned in 
Proposition 10.1: 
a) For f € L'(R”)anda € R" the translation by a is defined as f,(x) = f(x—a) 
for almost all x € R”. These translations and the multiplication by a corre- 
sponding exponential function are related under the Fourier transformation 
according to the following formulae: 


il 


a) Fle f)(p) = (F fap) 


b) (F fa)(P) 


= e“?(F f)(p) 


VpeER". 
VpeR’. 


b) For any A > 0 define the scaled function f;, by f,(x) = f(;) for almost all 
x € R". Then, for f € L'(R") one has 


(F fip) = AF FP) 


Prove: If f € L'(R) has a derivative f’ € L'(R), then f(x) > Oas |x| > oo. 
Show the embedding relation 10.6. 
Prove: If J is an isomorphism of the Hausdorff locally convex topological vector 
space E, then the adjoint /' is an isomorphism of the topological dual equipped 
with weak topology o. 


VpeR’. 


10.5 Exercises 161 


6. 


7. 


10. 


11. 


12. 


13. 


14. 


15. 


Show that the right-hand of inequality (10.13) is a polynomially bounded 
function of p € R’. 
Prove the following relation: 


(I¢7,9) = (Ip, Fo) Vee DR"), Vf €L'(R"). 


. Show that the Fourier transform F ¢ of atest function¢@ € D(R") is the restriction 


of an entire function to R", i.e., F@ is the restriction to R” of the function 
C's 2H (a...) Qnyt : oF pede 
R" 


which is holomorphic on C”. Conclude: if @ € 0, then F¢@ cannot be a test 
function in D(R”) (it cannot have a compact support). 


. For any a > 0 introduce the function g(x) = [];_, e~“!*! and show that 
uF e) = | (Fenprdp = Om). 
Assume that we know F’I; = (27)26. Then show that L(F¢)(x) = $(x) for 


all x € R” and all @ € S(R"), i.e., LF = id on S(R"). 

Hint: In a straightforward calculation use the relation elP*(Fp)( P= 
(Fo-x\(p). 

Define the action of a rotation R of R” on tempered distributions T on R” by 
R-T =T o R™! where T o R™! is defined by Eq. (4.6). Prove that the Fourier 
transformation F’ commutes with this action of rotations: 


R-(FT)=F(R-T) VT &S'(R”). 


Conclude: If a distribution T is invariant under a rotation R (i.e., R- T = T), 
so is its Fourier transform. 

Hints: Show first that (F@) o R = F(¢ o R) for all test functions @. 

In the notation of Lemma 10.3 show that the function R” 5 p > prim(€p - 
(—ix)* - uw) is polynomially bounded, for any w € N” and any u € D(R"). 
Calculate the integral 


1 | elp-x F 1 ec lelll 
(Qxy2 Ips P+ pe ae [xl 


Hints: Introduce polar coordinates and apply the Theorem of Residues 9.7. 
For a > 0 introduce the function g, on R defined by ga(p) = Te me and 
show for that for a,b > 0 one has 


8a * Sb = Satb- 


Hint: Recall Example 10.1.3 and the convolution theorem for functions in L'(R). 
Find a tempered elementary solution of the free Schrdédinger operator. 


162 10 Fourier Transformation 
References 


1. Chandrasekharan K. Classical Fourier transforms. Universitext. New York: Springer-Verlag; 
1989. 

2. Dewitt-Morette C, Dillard-Bleick M, Choquet-Bruhat Y. Analysis, manifolds and physics (2 
Volumes). Amsterdam: North-Holland; 1982. 

3. Donoghue WF. Distributions and Fourier transforms. New York: Academic; 1969. 

4. Dunford N, Schwartz JT. Linear operators. Part I: general theory. New York: Interscience 
Publisher; 1958. 

5. Hoérmander L. The analysis of linear partial differential operators. 1. Distribution theory and 
Fourier analysis. Berlin: Springer-Verlag; 1983. 

6. Hoérmander L. The analysis of linear partial differential operators. 2. Differential operators of 
constant coefficients. Berlin: Springer-Verlag; 1983. 

7. Mikusinski J, Sikorski K. The elementary theory of distributions. Warsaw: PWN; 1957. 


Chapter 11 
Distributions as Boundary Values of Analytic 
Functions 


For reasons explained earlier we introduced various classes of distributions as ele- 
ments of the topological dual of suitable test function spaces. Later we learned that 
distributions can also be defined as equivalence classes of certain Cauchy sequences 
of smooth functions or, locally, as finite order weak derivatives of continuous func- 
tions. In this chapter we learn that distributions have another characterization, namely 
as finite sums of boundary values of analytic functions. 

This section introduces the subject for the case of one variable. We begin by 
considering a simple example discussed earlier from a different perspective. The 
function z b> . is analytic in C\ {0} and thus in particular in the upper and lower 


half planes Hy and H_, 


As ={z=x+iyveC:xeC, +y>0}. 


The limits in D’(R) of the function f(z) = 1 = —- exist for y>0,z=xt+iye 


: ~~ x+iy 
Hi. as we saw earlier, 


F 1 1 
lim —= —. 
y>0,y>0 x ly x =x 10 


The distributions — 1 


zp are called the boundary values of the analytic function z +> = 
restricted to the half planes Hz. In Sect. 3.2 we established the following relations 


between these boundary values with two other distributions, namely 


1 1 ‘ 1 1 1 
- — = —2710, —-+ — =2vp-, 
x + 10 x— 10 x + 10 x— 10 x 


i.e., Dirac’s delta distribution and Cauchy’s principal value are represented as finite 
sums of boundary values of the function z i, z € C\ {0}. In this chapter we will 
learn that every distribution can be represented as a finite sum of boundary values of 
analytic functions. 


© Springer International Publishing Switzerland 2015 163 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_11 


164 11 Distributions as Boundary Values of Analytic Functions 


Recall that A(H..) stands for the algebra of functions which are analytic on Hx. 
Every F € A(H,) defines naturally a family Fy, y > 0, of regular distributions on 
R according to the formula 


(Fy, ) = F(x + iy)b(x) dx Voe DR). (11.1) 
R 
The basic definition of a boundary value now reads as follows. 


Definition 11.1 A function F € A(H,) is said to have a boundary value F, € 
D’(R), if and only if, the family of regular distributions F,, y > 0, has a limit in 
D’(R), for y > 0, ie., for every @ € D(R), 


lim (Fy, d) 
y>0 


exists in C. The boundary value F, € D’(R) is usually denoted by F(x + io) = F. 
The following result is a concrete characterization of those analytic functions 
which have a boundary value in the space of distributions. 


Theorem 11.1 A holomorphic function Fs, € A(Hs) has a boundary value in 
D’(R) if it satisfies the following condition: 

For every compact set K C R there are a positive constant C and an integer 
m €N such that for all x € K and all |y| € (0, 1] the estimate 


Cc 
[Fa + iy)| < be (11.2) 


holds. 


Proof Consider the case of the upper halfplane. In order to show that the above 
condition is sufficient for the existence of a boundary value of F one has to show 
that under this condition, for each @ € D(R), the auxiliary function 


een ee i PotayeGda. weeb 


has a limit for y > 0. 

It is clear (for instance by Corollary 6.1) that this function g is of class C(O, 00)). 
Since F is holomorphic, it satisfies the Cauchy—Riemann equations 0,F = id,F 
for all z = x + iy € A,. (Recall that d, stands for #). This allows us to express 
derivatives of g as follows, for n € N and any y > 0: 

BY) = fg Oy" FO + iy)b(x) dx 
= fpGdy re + iy)d(x) dx = (— i)” Se F(x + iy)6™(x) dx. 


The Taylor expansion of g at y = | reads 
gl uy ) ede. ed 
g(y) = » 9 = D+ Rn), Bn) = i} (y — og") de. 
! . 1 


This expansion shows that g(y) has a limit for y — 0, if and only if, the remainder 
RQ) does, for some n € N. Apply the hypothesis on F' for the compact set K = 


11 Distributions as Boundary Values of Analytic Functions 165 


supp @. This then gives a constant C and an integer m € N such that the estimate of 
our hypothesis holds. For this integer we deduce (| K | denotes the Lebesgue measure 
of the set K) 


oes C 
ie] < / [Fle + illo” Pool de < S1K Peni) 
R 


for all 0 < y <t < 1, and this implies that for nm = m the remainder has a limit for 
t — 0, and this limit is 


: (—1)""! 1 

lim = R,»(y) = ——— / pg De) de. 
yO, y>0 m! 0 
Since @ € D(R) is arbitrary, we conclude by Theorem 3.3 that F’ has a boundary 
value in D’(R). 

The restriction of the function z > 7 to Hs certainly belongs to A(H) and 
clearly these two analytic functions satisfy the condition (11.2), hence by Theorem 
11.1 they have boundary values =: Thus we find on the basis of a general result 
what we have shown earlier by direct estimates. There we have also shown that the 
difference of these two boundary values equals 27 id. In the section on convolution 
we learned that T « 6 = T for all T € D’(R). Thus one would conjecture that every 
distribution on R is the difference of boundary values of analytic functions on the 
upper, respectively lower, half plane. This conjecture is indeed true. We begin with 


the easy case of distributions of compact support. 


Theorem 11.2 /fT € €'(R) has the compact support K, then there is aholomorphic 
function T on C\K such that for all f € D(R), 


T(f)= tim [ (T(x + ie) — T(x — ie)] f(x) dx. (11.3) 
é\0 JR 


Proof For every z € R° = C\R the Cauchy kernel t bt ea + belongs to €(R). 


Hence a function T : R¢ > C is well defined by 
#@ = —(T©,—) 
pe ae Gee 


Since there is an m € N such that T satisfies the estimate 


ITA <C sup |D'fMI, 


eEK,v<m 


we find immediately the estimate 
IT(x + iy)|}<Cly"! VxeR, Vy <0. 


Furthermore, the estimate for T implies that T can be analytically continued to 
K° = C\K. For all z,¢ € K° one has, for z 4 ¢, 


1 1 1 1 


Fat tae 7! (t—z)(t—f) 


166 11 Distributions as Boundary Values of Analytic Functions 


As ¢ — z the right-hand side converges to es in €(R). We conclude that 
T(i2)-T(e) 1 1 1 1 
=> (TO, Pe at as 
z—-¢ 2 in (t —z)\(t —C) 2in (t — z) 


hence 7 is complex differentiable on K°. 


Now for z = x + iy, y > 0, we calculate T(z) _ T@ = (T(t), x(t — x)) where 


X= eae . This allows us to write, for f ¢ D(R), 


[ [T(x + iy) — 7 — iy) f@ dx = [ (T(t), Xy(t — x)) f(x) dx. 


In the Exercises of the chapter on convolution products (Sect. 7.4) we have shown 
that this equals 


(T(t), (xy * A). 


According to the Breit-Wigner Formula (3.9) x, > das y \ 0, hence (x,* f) > f 
in D(R) as y \, 0, and it follows that (T(t), (x) * f\()) > (TH), Ff) = TCf) as 
y \, 0. We conclude that the formula (11.3) holds. 

Note that in Theorem 11.2 the condition f € D(R) cannot be replaced by f € 
E(R). A careful inspection of the proof however shows that formula (11.3) can be 
extended to all f € E(R) which are bounded and which have bounded derivatives. 
In this case the convolution products occurring in the proof are well defined too. 

When one wants to extend Theorem 11.2 to the case of general distributions 
T € D’(R) one faces the problem that the Cauchy kernel belongs to E(R) but not to 
D(R). Thus a suitable approximation of T by distributions with compact support is 
needed. As shown in Theorem 5.9 of the book [1] this strategy is indeed successful 
(See also [2]). 


Theorem 11.3 For every T € D’(R) there is an analytic function F on K°, K = 
supp T, satisfying the growth condition (11.2) on Hx such that 


T(f)= tim | [F(x + ie) — F(x — is)| f(x) dx (11.4) 
é\0 JR 


forall f € D(R). One writes T(x) = F(x + io) — F(x — io). 

Similar results are available for distributions of more than one variable. This case 
is much more difficult than the one-dimensional case for a variety of reasons. Let 
us mention the basic ones. (1) One has to find an appropriate generalization of the 
process of taking boundary values from above and below the real line. (2) In the 
theory of analytic functions of more than one complex variable one encounters a 
number of subtle difficulties absent in the one-dimensional theory. 

We sketch the solution due to A. Martineau [5]. Suppose that U C C” is a pseudo- 
convex open set (for the definition of this concept we have to refer to Definition 2.6.8 
of the book [4]) and J” Cc R” an open convex cone. Suppose furthermore that F 
is a holomorphic function on Ur = (R” + i”) U which satisfies the following 


11.1 Exercises 167 


condition: For every compact subset K C 2 = R" NU and every closed subcone 
I' CT there are positive constants C and k such that 


sup |F(x + iy)| <Cly*  Vyer”. (11.5) 
xeK 
Then F(x + iy) has the boundary value F(x + iI°0) which is a distribution on §2 
and, as y tends to zero in aclosed subcone I’ Cc I, 


F(x+iy)> Fx+ir0) in D9). (11.6) 


For the converse suppose that a distribution T € D’(2) is givenon 2 = R’NU.Then 
there are open convex cones [},... , Jj, in R” such that their dual cones I7?,... , 15° 
cover the dual space of R” (7? = {é eR’: €-x>0Vxe T;}) and holomorphic 
functions F; on Ur,, j = 1,...,m, each satisfying the growth condition (11.5), 


such that T is the sum of the boundary values of these holomorphic functions: 


T(x) = Fy(x + 170) +--+ + Fin(x + iJ7,0). (11.7) 


11.1 Exercises 


1. Forn = 1,2,... define f,(z) = a z € C\{0} and show that the functions 


1 
(xtio)" 


J = f,| Hs have boundary values 


in D’(R). Then prove the formula 


1 _CM yy, 
(x + ioy't! nl x + io 


where D denotes the distributional derivative. 
2. For f € L'(R) define two functions F on H, by the formula 


~— (z) = f & ) 


2ia JRX—Z 


~ 


dx Vze Ax. 


Show: 
a) Fx is well defined and is estimated by 


. 1 
[Fz + iy)]}< =——Ilfll  Vze Ae. 
2m|y| 


b) Fx is holomorphic on Hy. 

c) Fx has a boundary value f € D’(R). 

d) For a Holder-continuous function f € L'(R) show that the boundary values 
are given by 


feo 


ay ee te 
= —— (vp —) * 
oo PG 


and deduce f = f, — fL. 


168 11 Distributions as Boundary Values of Analytic Functions 

3. a) Suppose a function f € L},.(R) has its support in R* and there are some 
constants a, C such that | f(€)| < Ce“ for almost all € € R*. Introduce the 
half plane H, = {z € C : Rez > a} and show that 


oe i e- f(E) dE (11.8) 


is a well defined analytic function on Hg. 
b) Suppose a distribution u € €’(R) has its support in the interval [— a, a], for 
some a > 0). Prove that 


in(z) = (u(é), e~*) (11.9) 


is a well defined analytic function on the complex plane C and show that there 
is a constant C such that 


|a(z)| < Cet®* VzeEC. 


The function f is called the Laplace transform of the function f usually written 
as f (z) = (L£f)(z) and similarly the function i is called the Laplace transform 
of the distribution u € E'(R), also denoted usually by a(z) = (Lu)(z). For further 
details on the Laplace transform and related transformations see [3, 6]. 

Hints: For the proof of the second part one can use the representation of 
distributions as weak derivatives of functions. 


References 


1. Bremermann H. Complex variables and Fourier transforms. Reading: Addison-Wesley; 1965. 

2. Cartan H. Elementary theory of analytic functions of one or several complex variables. Mineola, 
NY: Dover Publications; 1995. 

3. Davies B. Applied mathematical sciences. In: John F, Sirovich L, LaSalle JP, Whitham GB, 
editors. Integral transforms and their applications, vol. 25, 3rd ed. Berlin: Springer; 2002. 

4. Hormander L. An introduction to complex analysis in several variables. Princeton: Van Nostrand; 
1967. 

5. Martineau A. Distributions et valeur au bord des fonctions holomorphes. In: Theory of 
distributions, Proc. Intern. Summer Inst. Lisboa: Inst. Gulbenkian de Ciéncia; 1964. 
pp. 193-326. 

6. Widder DV. Pure and applied mathematics. In: Smith PA, Eilenberg S, editors. An introduction 
to transform theory, vol. 42. New York: Academic; 1971. 


Chapter 12 
Other Spaces of Generalized Functions 


For a nonempty open set §2 C R", we have introduced three classes of distributions 
or generalized functions, distributions with compact support €’({2), tempered distri- 
butions S’(2), and general distributions D'(2) and we have found that these spaces 
of distributions are related by the inclusions 


E(2Q)c S(2) Cc Dw). 


These distributions are often called Schwartz distributions. They have found numer- 
ous applications in mathematics and physics. One of the most prominent areas of 
successful applications of Schwartz distributions and their subclasses has been the 
solution theory of linear partial differential operators with constant coefficients as it 
is documented in the monograph of L. Hérmander [1, 2]. Though distributions do 
not admit, in general, a product, certain subclasses have been successfully applied 
in solving many important classes of nonlinear partial differential equations. These 
classes of distributions are the Sobolev spaces W"?(82),m € N, 1 < p < ~, 
§2 C R" open and nonempty and related spaces. We will use them in solving some 
nonlinear partial differential equations through the variational approach in Part III. 

In physics, mainly tempered distributions are used, since Fourier transformation 
is a very important tool in connecting the position representation with the momentum 
representation of the theory. The class of tempered distributions is the only class of 
Schwartz distributions, which is invariant under the Fourier transformation. General 
relativistic quantum field theory in the sense of Garding and Wightman [3-6] is based 
on the theory of tempered distributions. 

All Schwartz distributions are localizable and the notion of support is well defined 
for them via duality and the use of compactly supported test functions. Furthermore 
all these distributions are locally of finite order. This gives Schwartz distributions a 
relatively simple structure but limits their applicability in an essential way. Another 
severe limitation for the use of tempered distributions in physics is the fact that they 
allow only polynomial growth,but in physics one often has to deal with exponen- 
tial functions, for instance e*, x € R, which is a distribution but not a tempered 
distribution on R. These are some very important reasons to look for more general 
classes of generalized functions than the Schwartz distributions. And certainly a sys- 
tematic point of view invites a study of other classes of generalized functions too. 


© Springer International Publishing Switzerland 2015 169 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_12 


170 12 Other Spaces of Generalized Functions 


Accordingly, we discuss the most prominent spaces of generalized functions which 
are known today from the point of view of their applicability to a solution theory of 
more general partial differential operators and in physics. However, we do not give 
proofs in this chapter since its intention is just to inform about the existence of these 
other spaces of generalized functions and to stimulate some interest. 

The first section presents the generalized functions with test function spaces 
of Gelfand type S. The next section introduces hyperfunctions and in particular 
Fourier hyperfunctions and the final section explains ultradistributions according to 
Komatsu. 


12.1 Generalized Functions of Gelfand Type S 


The standard reference for this section is Chap. IV of [7] in which one finds all the 
proofs for the statements. 

Denote No = {0,1,2,3,...} and introduce for 0 < a,L < cw, j € N,m € No 
the following functions on C®(R"): 


Ik DFO) 
(L+ ty Mgak 


q(f:a.m, L, j) = sup R",k,a e NP, lal < n| (12.1) 


The set of all functions f € C°(R") for which q(f;a,m, L, j)is finite for allm € No 
and all j € N is denoted by 


Sa,L(R"). (12.2) 


Equipped with the system of norms g(-;a,m, L, j),m € No, j € N, it is a Fréchet 
space. Finally, we take the inductive limit of these spaces with respect to L to get 


S,(R") = ind lim; 9 Sg, (R"). (12.3) 


For a nonempty open subset {2 C R”, the spaces S,(S2) are defined in the same way 
by replacing C*(R”) by C°(2) and by taking the supremum over x € £2 instead of 
x eR’. 

Some basic properties of the class of spaces S,(2), a > 0, are collected in 


Proposition 12.1 Suppose 0 < a < a’ and consider any open nonempty subset 
2 CR", then 


D(2) C S4(2) C Sa(2) C S(L). (12.4) 


In this chain each space is densely contained in its successor and all the embeddings 
are continuous. 

Similarly we introduce another class of test function spaces S?(R”) distinguished 
by a parameter b > 0. For f € C°(R”), m € No and j € N define 


IX D® FO) 
ips )ielare 


p(f;b,m, M, j) = sup R",k,a ENP, |k| < m (12.5) 


12.1 Generalized Functions of Gelfand Type S 171 


The set of all f € C°(R") for which p(f;b,m, M, j) is finite for all m € No and all 
j € Nis denoted by S?™(R”). Equipped with the system of norms p(-;b, m, L, j), 
J € N,m € No, the spaces 


&?¥ (R") (12.6) 


are Fréchet spaces (see Chap. IV of [7]). Again we take the inductive limit of these 
spaces with respect to M > 0 to obtain 


S?(R") = ind limyso S?™ (R’). (12.7) 


Note the important difference in the definition of the spaces S?(R”) and the spaces 
S,(R"). In the definition of the continuous norms for these spaces, the rdles of 
multiplication with powers of the variable x and the derivative monomials D® have 
been exchanged and therefore according to the results on the Fourier transformation 
(see Proposition 10.1), one would expect that the Fourier transform maps these spaces 
into each other. Indeed the precise statement about this connection is contained in 
the following proposition. 


Proposition 12.2 The Fourier transformation F is a homeomorphism S?(R") > 
S;(R"). 
Suppose 0 < b <b’, then 


S°(R") c S*(R") Cc S(R"). (12.8) 


In this chain each space is densely contained in its successor and all the embeddings 
are continuous. 

The elements of S'(IR") are analytic functions and those of S°(R") for 0 <b <1 
are entire analytic. 

A third class of test function spaces of type S is the intersection of the spaces 
defined above. They can be defined directly as an inductive limit of spaces 5?" (R") 
with respect to L,M > 0. To this end, consider the following system of norms on 
C™©(R"), for L, M > Oand j,m EN: 


q(f;a,b,m, j,L,M)= 


|x*D* F()| 
(L + 5AM + Zyl oybo ” 


= sup R”,k,a en. (12.9) 


Denote the set of functions f € C°(R”) for which q(f;a,b,m, j, L, M) is finite for 
all m, 7 € N by 


SPM (R"). (12.10) 


Equipped with the system of norms g(-;a,b,m,j,L,M), m,j € N, the space 
SPR") is a Fréchet space. The third class of test function spaces is now defined 
by 


S>(R") = ind limyso,1>0 S?(R") (12.11) 


172 12 Other Spaces of Generalized Functions 


for a,b > 0. Fora function f € C°(R") to be an element of S?(R"), it has to satisfy 
the constraints both from S,(R”) and S?(R") with the effect that for certain values 
of the parameters a, b > 0, only the trivial function f = 0 is allowed. 


Proposition 12.3. The spaces S?(R") are not trivial if, and only if, 
a+b>1,a>0,b>0 or a=0,b>1 or a>1l1,b=0. 


The Fourier transformation F is a homeomorphism sb (R") > S/R"). 
Suppose 0 <a <a'andO < b < BD’ such that the space S?(R") is not trivial, then 

S?(R") is densely contained in S (R") and the natural embedding is continuous. 
In addition, we have the following continuous embeddings: 


S?(R") c S(R"),  S?(R") c S,(R"), S?(R") c SR"). (12.12) 


The elements in S}(IR") are analytic functions and those in S?(R") for O0<b<l 
are entire analytic, i.e., they have extensions to analytic, respectively to entire 
analytic, functions. 

The topological dual S?(R")’ of S?(R") defines the class of generalized functions 
of Gelfand type S°. Thus we get a two-parameter family of spaces of generalized 
functions. Since s> (R") c SCR") with continuous embedding, we know that these 
new classes of generalized functions contain the space of tempered distributions: 


S’ (R" ) Cc S?(R" \ . 


There are three important aspects under which one can look at these various spaces 
of generalized functions: 


a) Does this space of generalized functions admit the Fourier transformation as a 
homeomorphism (isomorphism)? 

b) Are the generalized functions of this space localizable? 

c) Are the Fourier transforms of the generalized functions of the space localizable? 


These questions are relevant in particular for applications to the theory of partial 
differential operators and in mathematical physics (relativistic quantum field theory). 

One can show that the spaces S?(R"), 1 < b, contain test functions of compact 
support. Thus for generalized functions over these test function spaces the concept 
of a support can be defined as usual. Since the Fourier transformation maps the space 
sP (IR") into S/R"), all three questions can be answered affirmatively for the spaces 
S?(R") 1 < a,b <0. 

According to Proposition 12.3 the smaller the parameters a,b > 0, the smaller 
the test function space S?(R”), and thus the larger is the corresponding space of the 
generalized functions. Therefore, it is worthwhile to consider generalized functions 
over the spaces S?(IR”) withO < a < 1 and/or 0 < b < 1 too. However according to 
Proposition 12.3, elements of the spaces S?(R"), 0 < b < 1 are analytic functions. 
Since there are no nontrivial analytic functions with compact support, the localization 
of the generalized functions with this test function space cannot be defined through 


12.2 Hyperfunctions and Fourier Hyperfunctions 173 


compactly supported test functions as in the case of Schwartz distributions. Thus it 
is not obvious how to define the concept of support in this case. 

The topological dual of a space of analytic functions is called a space of analytic 
functionals. As we are going to indicate, analytic functionals admit the concept of a 
carrier which is the counterpart of the concept of support of a Schwartz distribution. 

Let 22 C C” be anonempty open set, and consider the space O(.2) of holomorphic 
functions on (2 equipped with the system of seminorms 


|flx = sup | f(z)|, K C2 compact. (12.13) 
zeK 


Since 2 can be exhausted by a sequence of compact sets, the space O(£2) is actually 
a Fréchet space. For T € O(2) there are a constant C,0 < C < oo, and a compact 
set K C 2 such that 


ITA <SClflk Vf EOS). (12.14) 


The compact set K of relation (12.14) is called a carrier of the analytic functional T. 
Naturally, one would like to proceed to define the support of an analytic functional as 
the smallest of its carriers. But in general this does not exist and thus the concept of 
support is not always available. In this context, it is worthwhile to recall the definition 
E’ of Schwartz distributions of compact support where the same type of topology is 
used. 

With regard to our three questions, the space S/(R”) plays a distinguished réle 
since it is invariant under the Fourier transform and elements of its topological dual 
admit at least the concept of a carrier. As we will discuss in the next section, they 
actually admit the concept of support as the smallest carrier. 


12.2. Hyperfunctions and Fourier Hyperfunctions 


Recall the representation 
T(x) = Fux + 1140) +--- + Fn(x + 127, 0) (12.15) 


of adistribution T € D’(§2)as a finite sum of boundary values of certain holomorphic 
functions F),... , Fm, each of which satisfies a growth condition of the form (11.5). 
In a series of articles [8—10], M. Sato has shown how to give a precise mathematical 
meaning to a new class of generalized functions when in the above representation 
of distributions as a sum of boundary values of analytic functions all growth restric- 
tions are dropped. For this, he used a cohomological method and called these new 
generalized functions hyperfunctions on Q. In this way a hyperfunction T on Q 
is identified with a class of m-tuples of holomorphic functions. When Eq. (12.15) 
holds, one calls {Fi,... , Fin} defining functions of the hyperfunction 7. 


174 12 Other Spaces of Generalized Functions 


The space of all hyperfunctions on £2 is denoted by 6({2). From the above 
definition it is evident that it contains all Schwartz distributions on 92: 


D'(2) c Bi). 


It has to be emphasized that in contrast to the other spaces of generalized functions 
we have discussed thus far the space (£2) is not defined as the topological dual of 
some test function space. 

Spaces of hyperfunctions are well suited for a solution theory of linear differential 
operators with real analytic coefficients (see [1 1]). Consider for example the ordinary 
differential operator 


d 
P(x, D) = an(x)D" + +++ + ai(@)D +apx), DD = 


with a;, j = 1,...,m, real analytic functions on some open interval 2 C R, 
am # 0. In [11], it is shown how a comprehensive and transparent solution theory 
for 


P(x, Dyu(x) = T(x) (12.16) 


can be given in the space B({2) of all hyperfunctions on £2, for any given T € B(S2). 

As in the case of Schwartz distributions, one can characterize the subspace of those 
hyperfunctions which admit the Fourier transformation as an isomorphism (for this 
appropriate growth restrictions at infinity are needed). This subspace is called the 
space of Fourier hyperfunctions. Later the space of Fourier hyperfunctions on R” was 
recognized as the topological dual of the test function space of rapidly decreasing 
analytic functions Q(D") which is isomorphic to the space S} (R”) introduced in the 
previous section. Briefly the space OW") can be described as follows (see[12]). 

First, we recall the radial compactification D" of IR”. Let S?>! be the (n — 1)- 
dimensional sphere at infinity, which is homeomorphic to the unit sphere S”~! = 
{x € R"; |x| = 1} by the mapping x > xo, where the point x. € S”>! lies on the 
ray connecting the origin with the point x € S”~!. The set R” U S25, equipped with 
its natural topology (a fundamental system of neighborhoods of x,, is the set of all 
the sets Og r(Xo0) given by: 


Oa, Roo) = {& € RY E/|E| € 2, |E] > RYU (Eo05€ € 2} 


for every neighborhood 2 of x in S"~! and R > 0), is denoted by D”, called the 
radial compactification of IR”. Equip the space Q” = D” x iR"” with its natural 
product topology. Clearly, C” = IR” x iR” is embedded in Q”. Let K be a compact 
set in D", {U,,} a fundamental system of neighborhoods of K in Q” and O!"(U,,,) the 
Banach space of functions f analytic in U,, 0 C” and continuous on Us, MC" which 
satisfy 

lfllm = sup |f(z)| el < 90, 


zeUm NC" 


Finally we introduce the inductive limit of these Banach spaces of analytic functions 
QK) = ind lim O? (U,,). 
m—> oo 


It has the following properties: 


12.2 Hyperfunctions and Fourier Hyperfunctions 175 


Proposition 12.4 Let K C D" be compact. Then the space QK) is a DFS-space 
(a dual Fréchet-Schwartz space), i.e., all the embedding mappings 


Or (Um) ae OF eet: m=1,2,..., 


are compact. 
The space O(D") is dense in O(K). 
The Fourier transform F is well defined on QWD") by the standard formula 


(F f\(p) = xy"? | e!P* F(x)dx. 


It is an isomorphism of the topological vector space QD"). 

Note that, this inductive limit is not strict. Since O(D") is dense in (1K), 
continuous extensions from ©(D") to O(K) are unique if they exist at all. 

The topological dual QWD")’ of QD") is called the space of Fourier hyperfunctions 
on R". 

Suppose T € ((D") is a Fourier hyperfunction. Introduce the class C(T) of all 
those compact subsets K C D” such that T has a continuous extension Tx to 1K). 
As we have mentioned above each K € C(T) is called a carrier of T. 

On the basis of the Mittag—Leffler theorem for rapidly decreasing analytic 
functions (see [12, 13]), one proves the nontrivial. 


Lemma 12.1 For any T € QD") one has 


Ki, Ko € CQ) S Ky 0Ks € CP). 


Corollary 12.1 Fourier hyperfunctions T admit the concept of support, defined as 
the smallest carrier of T : 
supp T= Nxecr)K. 


The localization of Fourier hyperfunctions means that for every open nonempty 
subset 2 C R” one has the space of Fourier hyperfunctions on §2. This is summarized 
by stating that Fourier hyperfunctions form a (flabby) sheaf over R” [11, 12]. 

Fourier hyperfunctions have an interesting and quite useful integral representation 
which uses analyticity of the test functions in a decisive way. For j = 1,...,n 
introduce the open set W; = {z € QO": Imz # O}. The intersection W = ;_, W; 
of all these sets consists of 2” open connected components of Q” separated by the 
“real points.” For every z € W, it introduces the function h, defined by 


mn e wt —zj) 


h(t) =| | ———_.. 
I] 2m i(t; — z;) 
One shows h, € QD") for every z € W. Hence, for every T €¢ Q(D")’, we can define 


a function 7: W > C by T(z) = T(h,). It follows that T actually is a “slowly 
increasing” analytic function on W. Now given f € QD"), there is anm € N such 


176 12 Other Spaces of Generalized Functions 


that f € O”(U,,). Hence, we can find 5,, > O such that ly x---x I, C Un AWC" 
where Ij = ++I and i = {z; = tx; + 18m 1-00 < xj < co}. Since h, is 
a modified Cauchy kernel with appropriate decay properties at infinity, an application 
of Cauchy’s integral theorem implies 


/ F@h{-)dz= fC). 
I x-xIy 
Now applying T € QD") to this identity, we get 
i f@T@dz=T(f). (12.17) 
TX: xIy 


The integral on the left hand side exists since T(z) is slowly increasing and f(z) is 
“rapidly decreasing.” Certainly one has to prove that the application of the Fourier 
hyperfunction T ‘commutes’ with integration so that T can be applied to the integrand 
of this path integral. Then in Eq. (12.17) one has a very useful structure theorem for 
Fourier hyperfunctions: Every Fourier hyperfunction is represented by a path integral 
over a slowly increasing analytic function on W. In this way the powerful theory of 
analytic functions can be used in the analysis of Fourier hyperfunctions. 

Most results known for (tempered) distributions have been extended to (Fourier) 
hyperfunctions. And certainly there are a number of interesting results which are 
characteristic for (Fourier) hyperfunctions and which are not available for distri- 
butions. From a structural point of view and for applications the most important 
difference between Schwartz distributions and hyperfunctions is that hyperfunctions 
can locally be of infinite order. For instance the infinite series 


[oe 

Yi and, Lim (lan |n)!/” = 0 
1 noo 

r= 


has a precise meaning as a (Fourier) hyperfunction. Actually all hyperfunctions with 
support in {0} are of this form. Hence the set of hyperfunctions with support in 
{0} is much larger than the set of distributions with support in a point (compare 
Proposition 4.7). 


As an example consider the function e = which is defined and holomorphic on 
C\ {0}. Hence one can consider e ~zasa defining function of a hyperfunction [ e~ i ] 


with support in {0} and one shows (see [12]) 


In mathematical physics, Fourier hyperfunctions have been used successfully 
to extend the Garding—Wightman formulation of relativistic quantum field theory 
considerably (see [14—16]). For other applications of hyperfunctions, we refer to the 
books [11, 12]. 


12.3 Ultradistributions 177 


12.3. Ultradistributions 


The standard reference for this section is the article [17]. The theory of ultradis- 
tributions has been developed further in [18, 19]. Ultradistributions are special 
hyperfunctions and the space of all ultradistributions on an open set 2 C R” is 
the strong dual of a test function space, which is defined in terms of a sequence 
(M7) pen, Of positive numbers M, satisfying the following conditions: 


(M1) Logarithmic convexity: M; < My-1Mp41 for all p € N. 
(M2) Stability under ultradifferential operators (defined later): There are constants 
C > 0, L > 1 such that for all p € No, 


D ans 
M, <CL un M,My-q- 
<q<p 


(M3) Strong nonquasianalyticity: There is a constant C > 0 such that for all p € N, 


(oe) 


M,- M 
- q-l <Cp Pe 
q=ptl M, Mp1 


For special purposes, some weaker conditions suffice. Examples of sequences 
satisfying these conditions are the Gevrey sequences 


M,=(p) or p” or IP(L+ps) 


fors > 1. 

Now let $2 C R” be a nonempty open set. A function f € C™(S2) is called an 
ultradifferentiable function of class M, if, and only if, on each compact set K C Q 
the derivatives of f are bounded according to the estimate 


||D* fllk = sup|D* f(x)| < Cr'"'Mjay, — @ ENG (12.18) 
xeKk 


for some positive constants C and r. In order to make such a class of functions 
invariant under affine coordinate transformations, there are two ways to choose the 
constant r and accordingly, we get two classes of ultradifferentiable functions: f € 
C*(§2) is called an ultradifferentiable function of class (M,) (respectively of class 
[M ,]) if condition (12.18) holds for every r > 0 (respectively for some r > 0). 

E™)(2) (E1(Q2)) denotes the space of all ultradifferentiable functions of class 
(M,) (of class [M,]) on S82. The corresponding subspaces of all ultradifferentiable 
functions with compact support are denoted by D™»)(2), respectively D!“?!(Q2). 
All these spaces can be equipped with natural locally convex topologies, using the 
construction of inductive and projective limits. 

Under these topologies the functional analytic properties of these spaces are 
well known (Theorem 2.6 of [17]), and we can form their strong duals E™?)(Qy’, 
EMI QY, DY QyY, DMol(ay. 

D™r)(QY (D!rl(Qy) is called the space of ultradistributions of class M, of 
Beurling type (of Roumieu type) or of class (M,,) (of class [M)]). 


178 12 Other Spaces of Generalized Functions 


Ultradistributions of class (M,,) (of class [M,]) each form a (soft) sheaf over 
R”. Multiplication by a function in €™)(Q) (in EM™1(Q)) acts as a sheaf 
homomorphism. 

These spaces of ultradistributions have been studied as comprehensively as 
Schwartz distributions but they have found up to now nearly no applications in 
physics or mathematical physics. The spaces of ultradistributions are invariant under 
a by far larger class of partial differential operators than the corresponding spaces of 
Schwartz distributions, and this was one of the major motivations for the construction 
of the spaces of ultradistributions. Consider a differential operator of the form 


P(x, D)= a dg (x)D* dg € E*(L). (12.19) 


lo|<m 


It defines a linear partial differential operator P(x, D) : D*(2) — D*(&2)' as the 
dual of the formal adjoint P’(x, D) operator of the operator P(x, D) which is a 
continuous linear operator D*(§2) + D*(S2). Here * stands for either (M,,) or [M,]. 
In addition, certain partial differential operators of infinite order leave the spaces 
of ultradistributions invariant and thus provide the appropriate setting for a study of 
such operators. 

A partial differential operator of the form 


P(D) = > dg D", dy EC (12.20) 
|o|=0 


is called an ultradifferential operator of class (M,) (of class [M,]) if there are 
constants r and C (for every r > 0 there is a constant C) such that 


lda| < Cr'!|/Mia), la| = 0,1,2,.... 


An ultradifferential operator of class * maps the space of ultradistributions D*(Q)’ 
continuously into itself. 


References 


1. Hormander L. The analysis of linear partial differential operators 1. Distribution theory and 
Fourier analysis. Berlin: Springer-Verlag; 1983. 

2. Hormander L. The analysis of linear partial differential operators 2. Differential operators of 
constant coefficients. Berlin: Springer-Verlag; 1983. 

3. Wightman AS, Garding L. Fields as operator-valued distributions in relativistic quantum theory. 
Arkiv for Fysik. 1964;28:129-84. 

4. Streater RF, Wightman AS. PCT, spin and statistics, and all that. New York: Benjamin; 1964. 

5. Jost R. The general theory of quantized fields. Providence: American Mathematical Society; 
1965. 

6. Bogolubov NN, et al. General principles of quantum field theory. Mathematical physics and 
applied mathematics. Vol. 10. Dordrecht: Kluwer Academic; 1990. 


References 179 


de 


18. 


19. 


Gel’fand IM, Silov GE. Generalized functions II: spaces of fundamental and generalized 
functions. 2nd ed. New York: Academic; 1972. 

Sato M. On a generalization of the concept of functions. Proc Japan Acad. 1958;34:126- 
130;34:604-608. 

Sato M. Theory of hyperfunctions I. J Fac Sci, Univ Tokyo, Sect I. 1959;8:139-193. 

Sato M. Theory of hyperfunctions II. J Fac Sci, Univ Tokyo, Sect I. 1960;8:387-437. 


. Komatsu H, editor. Hyperfunctions and pseudo-differential equations. Springer lecture notes 


287. Berlin: Springer-Verlag; 1973. 
Kaneko A. Introduction to hyperfunctions. Mathematics and its applications (Japanese series). 
Dordrecht: Kluwer Academic; 1988. 


. Nishimura T, Nagamachi S. On supports of Fourier hyperfunctions. Math Japonica. 


1990;35:293-313. 
Nagamachi S, Mugibayashi N. Hyperfunction quantum field theory. Commun Math Phys. 
1976;46:119-34. 


. Briining E, Nagamachi S. Hyperfunction quantum field theory: basic structural results. J Math 


Physics. 1989;30:2340-59. 

Nagamachi S, Briining E. Hyperfunction quantum field theory: analytic structure, modular 
aspects, and local observable algebras. J Math Phys. 2001;42(1):1-31. 

Komatsu H. Ultradistributions I, Structure theorems and a characterization. J Fac Sci. Univ 
Tokyo, Sect IA, Math. 1973;20:25-105. 

Komatsu H. Ultradistributions II, The kernel theorem and ultradistributions with support in a 
manifold. J Fac Sci. Univ Tokyo, Sect IA. 1977;24:607-28. 

Komatsu H. Ultradistributions II. Vector valued ultradistributions and the theory of kernels. J 
Fac Sci. Univ Tokyo, Sect IA. 1982;29: 653-717. 


Chapter 13 
Sobolev Spaces 


13.1 Motivation 


As we will learn in the introduction to Part C on variational methods, all major 
developments in the calculus of variations were driven by concrete problems, mainly 
in physics. In these applications, the underlying Banach space is a suitable function 
space, depending on the context as we are going to see explicitly later. Major parts 
of the existence theory of solutions of nonlinear partial differential equations use 
variational methods (some are treated in Chap. 32). Many other applications can be 
found for instance in the book [1]. Here the function spaces which are used are often 
the so-called Sobolev spaces and the successful application of variational methods 
rests on various types of embeddings for these spaces. Accordingly we present here 
very briefly the classical aspects of the theory of Sobolev spaces as they are used 
in later applications. Some parts of our presentation will just be a brief sketch of 
important results; this applies in particular to the results on the approximation of 
elements of a Sobolev space by smooth functions. A comprehensive treatment can 
for instance be found in the books [2, 3] and a short introduction in [1]. 

We assume that the reader is familiar with the basics aspects of the theory of 
Lebesgue spaces and with Hdélder’s inequality. 


13.2 Basic Definitions 


Let 22 C R" be a nonempty open set, and fork = 0,1,2,... andl < p < w 
introduce the vector space 


clhPl(Q) = {u € CK(Q2) : D*u € L?(Q), |a| < k}. 


Here a = (aj,... ,@,) is an n-tuple of integers a; = 0,1,2,... and |a| = )7,_, aj, 
and D*u = fe, On this vector space define a norm for 1 < p < oo by 


a 
x] a 


© Springer International Publishing Switzerland 2015 181 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_13 


182 13 Sobolev Spaces 


I/p 
llr ={ do Dalz]. (13.1) 
la|<k 
and for p = oo by 
IIflleco = >, ID* fllz=c). (13.2) 
la|<k 


The Sobolev space W*"? (2) is by definition the completion of C'-?!(2) with respect 
to this norm. These Banach spaces are naturally embedded into each other according 
to 

WP(2) c Wi! P(Q)- + © WP? (RQ) = L?(2). 


Since the Lebesgue spaces L?(§2) are separable for 1 < p < oo one can show 
that these Sobolev spaces are separable too. For 1 < p < o the spaces L?(S2) 
are reflexive, and it follows that for 1 < p < oo the Sobolev spaces W“?(Q2) are 
separable reflexive Banach spaces. 

There is another equivalent definition of the Sobolev spaces in terms of weak (or 
distributional) derivatives due to Meyers and Serrin (1964) [3, 4]: 


W'?(Q) = {f € L?(Q): D*f € L?(Q) (weakly) for all |a| <k}. (13.3) 


Here D*% f stands for the weak derivative of f, i-e., for all @ € CO°(S2) one has in the 
sense of Schwartz distributions on 2 


(D" f,) = (—1)! / F(x)D%b(x) de. 


Theorem 13.1 Equipped with the norms (13.1) respectively (13.2) the set W?(2) 
is a Banach space. In the case p = 2 the space W'*(Q) = H*(Q) is actually a 
Hilbert space with the inner product 


(f 8) Hk) = > azic - D% g(x) dx. (13.4) 


|a|<k 


The spaces W?(Q) are called Sobolev spaces of order (k,p). 


Proof Since the space L?(@) is a vector space, the set W?(2) is a vector space too, 
as a subspace of L?({2). The norm properties of ||-|| ,»() easily imply that ||- || ws.) 
is also a norm. 

The local Sobolev spaces wee ($2) are obtained when in the above construction 
the Lebesgue space L? ($2) is replaced by the local Lebesgue space L?, ($2). Elements 


in a Sobolev space can be approximated by smooth functions, i.e., these spaces allow 
mollification. In details one has the following result. 


Theorem 13.2 Let 92 be an open subset of R", k € No = NU {0} and 1 < p < ow. 
Then the following holds: 


13.2 Basic Definitions 183 


a) Forueé WEP(Q) there exists a sequence u, € C&°(82) of C© functions on Q 
with compact support such that Un, — u in wee (22). 

b) C®(2)N W?(2) is dense in W*?(Q). 

c) C>(R") is dense in WP (R"). 


Proof Here we have to refer to the literature, for instance [2, 3]. 


Naturally, the space C2°({2) is contained in W*?(Q) for all k = 0,1,2,... and 
all 1 < p < oo. The closure of this space in W"?(Q) is denoted by Wy? (2). In 
general Wo ’P(Q) is a proper subspace of W"?(2). For 2 = R” however, equality 
holds. 

The fact that Wo ?(Q) is, in general, a proper subspace of W?(92) plays a 
decisive role in the formulation of boundary value problems. Roughly one can say 
the following: If the boundary ” = 0£2 is sufficiently smooth, then elements u € 
W?(2) together with their normal derivatives of order < k — 1 can be restricted to 
I’. And elements in Wo *?() can then be characterized by the fact that this restriction 
vanishes. (There is a fairly technical theory involved here [3]). A concrete example 
of a result of this type is the following theorem. 


Theorem 13.3 Let 822 C R" be a bounded open subset whose boundary ' = 02 
is piecewise C!. Then the following holds: 


(a) Every u € H'(Q) has a restriction you = u|I to the boundary; 
(b) Hj(Q) = ker yo = {u € H'(Q): yo(u) = O}. 


Obviously, the Sobolev space W*?(2) embeds naturally into the Lebesgue space 
L?(§2). Depending on the value of the exponent p in relation to the dimension n of the 
underlying space IR” it embeds also into various other functions spaces, expressing 
various degrees of smoothness of elements in W“?(2). The following few sections 
present a number of (classical) estimates for elements in W*?(Q) which then allow 
to prove the main results concerning Sobolev embeddings, i.e., embeddings of the 
Sobolev spaces into various other function spaces. 

A simple example shows what can be expected. Take y € Co°(R”) such that 
w(x) = 1 for all |x| < 1 and define f(x) = |x|?W(x) for x € R", for some qg € R. 
Then Vf € L?(R”)” requires n + (¢q — 1)p => 0, or 


geia=. 
Pp 


Therefore, if 1 < p <n theng < 0 is allowed and thus f can have a singularity 
(at x = 0). If however p > n, then only exponents g > 0 are allowed, and then f 
is continuous at x = 0. The following estimates give a much more accurate picture. 
These estimates imply first that we get continuous embeddings and at a later stage 
we will show that for exponents | < p <n these embeddings are actually compact, 
if 2 is bounded. 


184 13 Sobolev Spaces 
13.3. The Basic Estimates 


13.3.1 Morrey’s Inequality 


We start with the casen < p < +00. Denote the unit sphere in R” by S and introduce 
for a Borel measurable set 7 C S witha (I") > 0 (o(/’) denotes the surface measure 
of I”) the sets 


Ivy ={x+to:o€T,0<t<r}, xeR"’, r>0. 


I, is the set of all lines of length r from x in the direction w € I’. Note that for 
measurable functions f one has 


/ f(y)dy = [ ant" f f(x + tw) do(@). (13.5) 
Per 0 P 


Choosing f = | we find for the Lebesgue measure of I}, -: 


[Ll =r"o()/n. (13.6) 


Lemma 13.1 Jf S,x,r are as above and u € C!(I,,) then 


n V 
J woy-woidy s = f S i ey. (13.7) 
Tyr Ed 


ry, xe yl" 


Proof Fory=x+to,0<t<r,and@ e€ TI one has 
13 
u(x + tw) — u(x) = / w-Vu(x + sw) ds, 
0 
thus integration over I” yields 


/ |ju(x + tw) — u(x)| da(@) < i i |Vu(x + sw)| do(w) ds 
r 


V 
=['s ee if |[Vu(x + so)| de@ds 
0 r |x+s@—x|" |x + s@ — x]! 


Vu Vu 
= / | ce a2 / | oun as 
Vet ly —x| Tyr ly —x| 


If we multiply this inequality with r"~!, integrate from 0 to r and use Eq. (13.5) we 
get (13.7). 


13.3 The Basic Estimates 185 


Corollary 13.1. For anyn < p < +00, any0 <r < oo, any x € R", and any 
Borel measurable subset I C S such that o(I”) > 0, one has, for all u € CUP es) 


|u(x)| < Coo), 7,7, P)llullwiecr., (13.8) 
with 
ri-n/P n!/P p- 1 1-1/p 
COG) LD = yp (=) i 


Proof Clearly, |u(x)| < |u(y)| + |u@x) — u(y), for any y € I}; integration over 
I. and application of (13.7) gives 


r |Vu(y)| 
[Px |]u@)| = |u(x)|dy < |u(y)| dy + ma AY: 
Tyr Tyr n Tyr |x ~ y| 


Now apply Holder’s inequality! to continue this estimate by 


r” 1 
S llullzecy lca) + Vallee) = (13.9) 
uy x — -| LF) 
where gq is the Hélder conjugate exponent of p, i.e.,g = fz 7: Calculate 
p-| 
1 -1\?7 
| a = rip («ne= ) (13.10) 
1-1?" Teac, p-n 


and insert the result into (13.9). A rearrangement and a simple estimate finally gives 
(13.8). 


Corollary 13.2. Considern € N and p € (n, +00]. There are constants A = Ay 
and B = B' (B,, given by (13.12)) such that for any u € C'(R") and any x,y € R" 
one has (r = |x — y|, B(x,r) is the ball with center x and radius r) 


p-l 


»{ PT 1 . =F 
Hi oleeke (2=) Vulleraenasoryle — yl? (13.11) 


Proof Certainly, the intersection V = B(x,r)MB(y,r) of the two balls is not empty. 
Introduce the following subsets I", A of the unit sphere in R” by the requirement 
thatx + rT = (OB(x,r))N BO,r) and y+rA = (OBYy,r))N Bi,r), ie. = 
1(0B(x,r)N B(y,r)—x) and A = +(0B(y,r)N B(x, r)— y) = —I-. Itis instructive 
to draw a picture of the sets introduced above (Fig. 13.1). 


' This inequality says: whenever f € L? and g € L’,1 < p<, ; + =1,then f-geL! 
and || f- glli < If ll, lglg for p = 2 = q this is Schwarz inequality. 


186 13 Sobolev Spaces 


W=TrMAyy r=|x-yl 


Fig. 13.1 Intersecting balls and the related sets I". and Ay, 


Since I, = rl) and Ay, =r Ay, we find that 


[Dr a) Ay,r| [Pea N Ay,1| 
B, = = 
Dr 11 


(13.12) 


is a number between 0 and 1 which only depends on the dimension n. It follows 
rl S |Ay,,| S By |W, W= Tyr a Ayy. 
Now we estimate, using Lemma 13.1 and Holder’s inequality 


ics) — u(y Sf nox) — wtanide + fate) — uty) 


< fies) — ude + fz) — noid 
By Ay 


n Vv n Vv 
ae / |Vu(z)| deta / |Vu(z)| 4 
hdr, |jxayl diy =e 


1 


ke 


1 


+ ||Vullzeca,,) py 


LIIy 5) 


(Ito. 
L4(Ay,) 
r™ 
s ar Vuln) 


| 4 jn-l . 
L4(I0,,) 


13.3 The Basic Estimates 187 


Taking (13.10), (13.12), and (13.6) into account and recalling r = |x — y|, estimate 
(13.11) follows with A = o(I)7!. 


Recall the definition of Hélder continuous functions: For 0 < a < 1 and an open 
set 2 CR", C°*(2) denotes the space of all bounded continuous functions f on 2 


for which 
gj «s WOOP. -, 


x,yEeQ xAy |x ~ yl 


Theorem 13.4 (Morrey’s Inequality) Suppose n < p < +00 andu € W'?(R"). 
Then there is a unique version u* of u (i.e., u* = u almost everywhere) which is 
Holder continuous of exponent 1 — = ie, UX € c°! 7 (R") and satisfies 


eel 01-8 Gan < C |lullwie@ry (13.13) 
where C = C(n, p) is auniversal constant. In addition the estimates in (13.7), (13.8), 
and (13.11) hold when u is replaced by u*. 


Proof At first consider the case n < p < oo. For u € C}(R") Corollaries 13.1 and 
13.2 imply (C,(R") denotes the space of bounded continuous functions on R”) 


Ju(y) — u(x)| 
1% 


lelle,aany S Cllullwirq@ary and 
ly—x| 


< C||Vulleecen). 


This implies 


lll 01-3 pny < Cllullwircery. (13.14) 
Ifu € W'-?(R") is given, there is a sequence of functions u 7 eC t (R”) such that u; > 
uin W!-?(IR"), Estimate (13.14) implies that this sequence is also a Cauchy sequence 
nics (R”) and thus converges to a unique element u* in this space. Clearly 
Estimate (13.13) holds for this limit element u* and u* = u almost everywhere. 
The case p = oo and u € W!?(R") can be proven by using a similar approxima- 
tion argument. 


Corollary 13.3. (Morrey’s Inequality) Let 82 be an open bounded subset of R" 
with smooth boundary (C'!) andn < p < o. Then for every u € W':?(Q) there 
exists a unique version u* in Cra) satisfying 


ol 0.1-$ Gey <C lull wi.rcay « (13.15) 


with a universal constant C = C(n, p, £2). 


Proof Under the assumptions of the corollary the extension theorem for Sobolev 
spaces applies according to which elements in W!?({2) are extended to all of R” by 
zero such that there exists a continuous extension operator J : W'?(Q) > W!?(R") 


188 13 Sobolev Spaces 


(see for instance Theorem 48.35 of [1]). Then, given u € W!?(Q), Theorem 13.4 
implies that there is a continuous version U* € Co! R") of Ju which satisfies 
(13.13). Now define u* = U*|g. It follows 


oT 01-8 oy < OI 601-8 pny <CllJullwieay < C llullwseca) - 


13.3.2. Gagliardo-Nirenberg-Sobolev Inequality 


This important inequality claims that 
llMllze < CllVuller, we CCR") (13.16) 


for a suitable exponent g depending on the given exponent p, 1 < p <n. This 
exponent is easily determined through the scale covariance of the quantities in this 
inequality. For 4 > O introduce u, by setting u,(x) = u(Ax). A simple calculation 
shows ||u,||z¢ = A7"/4 lull pq and ||Vuy||z2 = A'-"/? || Vul|ze. Thus inserting uw, into 
(13.16) gives 

a"/4 |lulleg < CAI? || Vullr 


for all 4 > 0. This is possible for all wu € C!(IR”) only if 
l1—n/pt+n/q=0, ie, -=-4-. (13.17) 


It is a standard notation to denote the exponent g which solves (13.17 ) by p*, 1.e., 


with the convention that p* = oo if p =n. 
As we will show later, the case 1 < p < n can easily be reduced to the case 
p = 1, thus we prove this inequality for p = 1,i.e., p* = 1* = 4 


~~ n-l" 


Theorem 13.5 For all u € W''!(R") one has 


n z 1 
Illi» = llull_o s I] (/ |d;u(x)| ax) <n ?|Vulli (13.18) 
i=) “R" 


Proof According to Theorem 13.2 every element u € W!!(IR”) is the limit of a 
sequence of elements u; € C 1(R”). Hence we only need to prove this inequality for 
u € C}(IR"), and this is done by induction on the dimension n. 

We suggest that the reader proves this inequality forn = 1 andn = 2. Here we 
present first the case n = 3 before we come to the general case. 


13.3 The Basic Estimates 189 


Suppose that w € C)(IR%) is given. Observe that now 1* = 3/2. Introduce the 
notation x! = (91, X2,.X3), x? = (x1, yo, x3), and x? = (x1, %, y3). The fundamental 
theorem of calculus implies for i = 1,2,3 


Xj : [o.e) ; 
ines f lux \dyi = f laju(x')| dy, 
—00 


—oo 


hence multiplication of these three inequalities gives 


Spank i 
wo <P] (/ \aju(x')| av) 
ial Ve 


Now integrate this inequality with respect to x; and note that the first factor on the 
right does not depend on x): 


ie) 5 3 6S i 
[wort an < (/ us!) dy) [TI({ ux!) dy) dx 
R —0o R j~9 —0o 


Apply Hélder’s inequality (for p = q = 2) to the second integral, this gives the 


estimate 
(o.e) 5 3 ie.) . 3 
<(/ auu(x" dy) I(/ |d:u(x')| dx, dy) 
—CO CO 


i=2 ~ 


Next we integrate this inequality with respect to x2. and apply again Hdélder’s 
inequality to get 


/ lu(x)|? diy dxy 
R 


i 1 
2 2 z 
<({ |dou(x")| dx dy») / (/ jauw(s ay 
R2 R —oo 
oe 1 } 
(/ Lagu(x?)| dx dys) an=(f |aau(x2)| dx ay») 
—0Oo R2 


1 


! ! 
(/ |d,u(x')| dy, drs) (/ |d3u(x°)| dx; dx ays) ; 
R2 R3 


A final integration with respect to x3 and applying Hdélder’s inequality as above 
implies 
1 


z 
/ \u(x)|2 dx; dx dx3 < (/ |, u(x')| dy, dx axs) x 
R3 R3 


1 1 
2 2 
( i: |A2u(x")| dx; dy. ars) ( i |d3u(x?)| dx; dx2 dys) = 
R3 R3 


190 13 Sobolev Spaces 


3 i i 
I] ( / Jaju(x)| dxy dx dxs) < ( ll |Vu(x)| dxy dx dxs) 
i=1 YR R 


which is the claimed inequality for n = 3. 
The general case uses the same strategy. Naturally some more steps are necessary. 


Now we have 1* = aa For x = (x1,...,X,) € R” introduce the variables x! = 
(X1,--- »Xi-1, Vis Xi41,--- >Xn). The fundamental theorem of calculus implies for 
i=l,. 
incor sf uate) dy 
R 
and thus 


lux <]] ([ ux'ntay.) (13.19) 
i=1 


Recall Hélder’s inequality for the product of n — | functions in the form 


<[ [Ala (13.20) 
j=2 


1 


and integrate (13.19) with respect to x, to get 
n i 


I oc ici (/ ae) ny (| Oo dx, 
R " neues 
< (/ d,u(x!)|d vn) Il (/ \a;u(x')| dx a) 
i=2 i 
7 (/ dyu(x') a) ( |dou(x)| dx; dy») a 
R R 


«TI I(/. Juucx)} dn, dy) 


i= 


where in the last step we isolated the x2 independent term from the product. Now 
integrate this inequality with respect to x2 and apply (13.20) again. This implies, 
after renaming the integration variable yz, 


1 
: a=T 
/ |u(x)|"-T dx, dxz < (/ |d2u(x)| dxy ax.) 
R R 


n 


Sides | Ghat 
«LE (fi 1u(x)| 1) H(/, u(x" )| dxy vi) x2 < 


i=3 


13.3 The Basic Estimates 191 


=I aT 
(/ |d2u(x)| dx, ars) (/ |d,u(x)| dx, drs) 
R2 R2 


n elk 
; n—-1 
x] ] Ge |d;u(x")| dx, dx> av) 
i=3 
ele. 


2 a” mt 
= I] (/ |d;u(x)| dx, drs) x I] (/ |d;u(x')| dx, dx. dv) F 
ee eae i=3 YR 


Obviously, one can repeat these steps successively for x3,... , x, and one proves by 
induction that fork € {1,... ,n} we get the estimate 


n 


k = 
7 n—1 
i |u(x)|"-1 dx, dxy--- dx, < I] (/ |0;u(x)| dx; dx2--- ax) 
RK 1 WR 
d;u(x')| dx, dx2-++ dx, dy; 
x I] (f... u(x')| dx; dx2 Xk ») 


i=k+1 
where naturally for k = n the second product does not occur. Thus for k = n one 
has 


n ” mT 
|u(x)|"-T dx, dx2--- dx, < I] (/ |0;u(x)| dx; dxz--- dx, 
jap AVE 


In order to improve this estimate recall Young’s inequality in the elementary form 
TTje: Ai < + DL, A?, where A; > 0. Thus we get 


n i n 
a 1 
ne <[1({ [uu] ax) <2 f lamayiax 
i=] “R" i=) VR" 


and by Hélder’s inequality one knows }~"_, |d;u(x)| < /n|Vu(x)|, hence 
lull» < Ze [lVulli 


Rn 


Remark 13.1 The starting point of our estimates was the identity u(x) = 
fe 0;u(x') dy; and the resulting estimate 


incor sf eiute Id, i=l,...,n. 
R 


Woe ; ( / "ea aOR a ce / sux!) dy) 


we can improve this estimate to 


If we write 


1 ; 
|u(x)| < >| lo;u(x')|dy;,, i=1,...,n. 
R 


Next we look at the case 1 < p <n. As we will see it can easily be reduced to 
the case p = 1. 


192 13 Sobolev Spaces 


Theorem 13.6 (Gagliardo-Nirenberg-Sobolev Inequality) /f1 < p <n then, for 


allu € W'?(R"), with p* = 22, 


ul a 
ne Ta A= 


Proof Since elements in W':?(R") can be approximated by elements in C!(R") it 
suffices to prove Estimate (13.21) for u € C 5 (IR”). For such a function u consider the 
function v = |u|’ EC : (R") for an exponent s > | to be determined later. We have 
Vv = s|u|*'sgn(u)Vu and thus by applying (13.18) to v we get 


|| Vill p- (13.21) 


s . 
[lel ie < selva lh = alu 'Vul|, < ala ‘| lVullp (13.22) 


where q is the Hélder conjugate exponent of p. Note that this estimate can be written 
as 


lull’. < lel 1 gl Vall 


Now choose s such that s1* = (s — 1)q. This gives s = Par = o - and accordingly 
the last estimate can be written as 


AY 
lal. << a llallye Val p. 


p* 


7 


Inserting the value s = 7 muen 7 of s now yields (13.21). 


Corollary 13.4 Suppose that 2 C R" is a bounded open set with C'!-boundary. 
Then for all p € [1,n) and 1 < q < p®* there is a constant C = C(&2, p,q) such 
that for all u € W'?(Q) 

lilly < Clasp - 


Proof Under the given conditions on 2 one can show that every u € W!:?(Q) has 
an extension to Ju € W'?(R") (ie, Ju|Q = uand J: W'?(2) > W!P(R") is 
continuous). Then for u € C!'(2)N W!?(@) 

ll pp* 2) = Cll Jullpr* cary < CIV |lapan < C ll wircay - (13.23) 


Since C!(Q) is dense in W!?(92), this estimate holds for all u € W!?(Q). If now 
1 <q < p* asimple application of Hélder’s inequality gives 


1 1 
lel cac2y < Well eo cayll Ll escay = Mell po* ql 2l'? < C12" lull wre) 


where + + 4 =a 


13.4 Embeddings of Sobolev Spaces 193 
13.4 Embeddings of Sobolev Spaces 


13.4.1 Continuous Embeddings 


In this short review of the classical theory of Sobolev spaces we can only discuss the 
main embeddings results. In the literature, one finds many additional cases. 
For convenience of notation let us introduce, for a given number r > 0, 


r if r ¢ No 
r= 
r+6 ifreNo 
where 6 > 0 is some arbitrary small number. For a number r = k + a with k € No 
and 0 <a < 1 we write C’(92) for C““(2) (see List of Notation). 


Lemma 13.2. Fori ¢ Nand p >nandi >n/p(ie,i>1ifp>nandi > 2if 
p =n) one has 
Wi?(Q)S ci-@/P)+(Q) 


and there is a constant C > 0 such that for all u € W'?(Q) 


I|U|| c’-@/p)+ (2) <Cc lwllip (13.24) 


Proof As earlier it suffices to prove (13.24) for u € C°(2). For such u and p > n 
and |a| < i — 1 apply Morrey’s inequality to get 


|| D* ull c0.1-n/n¢q) S Cl] D*ulli,p 


and therefore with C'~"/?(Q) = C'—!.!-"/P(Q), we get (13.24). 
If p = n (and thus i > 2) choose g € (1,n) close to n so that i > n/q and 
* qn 


ile wera ae Then, by the first part of Theorem (13.7) and what we have just 
shown 


Wi" (Q) > W'4(Q) > WINE (Q) > CA 1M/9"(Q), 


As q tn implies n/q* | 0, we conclude W':"(Q) — C!~?*(Q) for any @ € (0, 1) 
which is written as 
Wwi"(Q2) cy ci-@/M+(Q), 


Theorem 13.7 (Sobolev Embedding Theorems) Assume that 82 = R" or that 
2 is a bounded open subset of R" with a C!-boundary; furthermore assume that 
1 < p < wandk,m € Nwithm < k. Then one has: 


(1) If p <n/m, then W*?(Q) > WE"4(Q) for gq = Me or =1-™ > 0, 
and there is a constant C > 0 such that 


WWllk-mg SC lull, forallue w(Q). (13.25) 


194 13 Sobolev Spaces 


(2) If p > n/k, then W*?(Q) — Ck-@/P)+(Q) and there is a constant C > 0 such 
that 


llullct-o/r+(q) <C llullep for allue W*?(Q). (13.26) 
Proof Suppose p < n/m and u € W?(Q); then Du € W!?(2) for all |a| < 


k — 1. Corollary 13.4 implies D’u € L? (2) for all Ja| < k — 1 and therefore 
W?(Q) — W*-|?"(Q) and there is a constant C; > 0 such that 


Wulle—1,p) < C1 llullep (13.27) 
for allu €¢ W*-?(), with p; = p*. Nextdefine p;, j > 2, inductively by pj = Pj-1- 
Thus + = —- — ! and since p < n/m we have 1 —=1_# 5 (. Therefore, 

Pj Pj-1 n Pm P t 


we can apply (13.27) repeatedly and find that the following inclusion maps are all 
bounded: 


w?(Q) == w'-}P1(Q) ag W'-2-P2(Q) joie ey Wwk-mpm (2) 


and part (1) follows. 

In order to prove part (2) consider p > n/k. For p > n the statement follows from 
Lemma 13.2. Now consider the casen > p > n/k and choose the largest m such that 
1<m <kandn/m > p. Define gq > nbyg = “2. (ie. + = +— > 0). Then, 


by what we have established above, the following inclusion maps are all bounded: 


w*?(Q) ey wk™-91(Q) cy CR t- 0/040) _ Ce) e CHW PH) 


which is the estimate of Part (2). 


In the case p = 2 and S2 = R” the Fourier transform F is a unitary operator 
on L?(R"). This allows to give a convenient characterization of the Sobolev space 
H*(R") = W*?(R”) and to prove a useful embedding result. 

Recall that for u € H*(R") one has F(D%u)(p) = i'@! p*F(u)(p). Hence we can 
characterize this space by 


H*(R") = {u € L?(R"): p*F(u) € L7(R"), || < k} 
= {ue L°(R"): (1+ ph? Fu) € L(R")}. 


This definition can be extended to arbitrary s € R and thus we can introduce the 


spaces 
H°(R") = {ue L°(R"): 1+ py? FW € LR}. 


As we are going to show, this space can be continuously embedded into the space 


CER") = } f © CAR"): Wf llco = 7p ep |D® f(x)| < ~|. 
la|<k xeR” 


13.4 Embeddings of Sobolev Spaces 195 


Theorem 13.8 For k € N ands > k+n/2 the Sobolev space H*(R") is 
continuously embedded into the space CK(R") and one has for all u € H*(R") 


Ilulllkco SC llulls2, lim |D*u(x)| = 0, la] <k. 
|x| oo 


Proof Recall that the Lemma of Riemann—Lebesgue says that the Fourier transform 

of an L'(R") function is continuous and vanishes at infinity. For |a| < k ands > 
k +n/2 one knows 

IP 

pe (1 + p?)* 


Thus, for u € H*(R") we can estimate 


dp = C2 < ow. 


1/2 
/ |p“(Fu)(p)| dp < Ca (/ (lL pyiFuprr dp) = Cy llulls.2 
Re Re 
and therefore for all x € R” 


|D®u(x)| = | i e'P* p*(Fu)(p) dp| < Cy llullsa- 
R 


It follows |lullkoo < |lulls2. By applying the Lemma of Riemann—Lebesgue we 
conclude. 


Remark 13.2 Fors > n/2 the Sobolev spaces H*(IR”) have the remarkable property 
of being an algebra, i.e., if u, v € H*(R”), then the point-wise product u - v belongs 
to H*(R") and |[u- vils2 < [lulls llvils.2. The proof relies on Young’s convolution 
inequality” and the estimate ||F(u)||, < C \|u\|,.2 which holds for s > n/2 with 
C= f nes < oo. Because of this fact these spaces are used naturally in the study 
of nonlinear second order differential equations. For further details see [5, 6, 7]. 


13.4.2. Compact Embeddings 


Here we show that some of the continuous embeddings established above are actually 
compact, that is they map bounded subsets into precompact sets. There are various 
ways to prove these compactness results. We present a proof which is based on the 
characterization of compact subsets M Cc L‘4(IR”), due to Kolmogorov and Riesz 
[8, 9]. 


2 If f © L? andg € L’, then the convolution f *g belongs to L’, 7 4H 


p=lt els bare os 
and || f * gil < If llp lglg 


196 13 Sobolev Spaces 


Theorem 13.9 (Kolmogorov-Riesz Compactness Criterion) Suppose 1 < q < 
oo. Thena subset M C L4(R") is precompact if, and only if M satisfies the following 
three conditions: 


(a) M is bounded, i.e., 


Ac <0o Vem fill <C; 
(b) 
Ves0 dr<oo Vfem mat | < &; 
q 
(c) 
Vex dso Vpem V yenn Ty(f) = fila <€. 


lyl<r 


Here the following notation is used: zz is the operator of multiplication with the 
characteristic function of the set {x € IR” : |x| > R} and t, denotes the operator of 
translation by y € R", i.e., t) (f(x) = f(x + y). 


Remark 13.3 If 8 C R" is an open bounded subset we can consider L7({2) as 
a subset of L’(IR”) by extending all elements f € L4({2) by 0 to all of R”. Then 
the above characterization also provides a characterization of precompact subset 
M c L‘4(82) where naturally condition (b) is satisfied always and where in condition 
(c) we have to use these extensions. 

There are several versions of compact embedding results depending on the as- 
sumptions on the domain §2 C R” which are used. The following version is already 
quite comprehensive though there are several newer results of this type. 


Theorem 13.10 (Rellich-Kondrachov Compactness Theorem) Let 2 C R"” be 
a bounded domain. Assume that the boundary of Q is sufficiently smooth and that 
1 < p < wandk = 1,2,.... Then the following holds: 


(a) The following embeddings are compact: 
(i) kp <n: W*?(Q) > L4(Q), 1 <q < p* = er 
(ii) kp =n: W?(Q) > L4(Q), 1 <q < ow; 
(iii) kp > n: W*?(Q) > CP). 
(b) For the subspaces We?(Q) the embeddings (i-iii) are compact for arbitrary 
open Sets $2. 


Proof In some detail we present here only the proof of embedding (i) of part (a) for 
k = 1. For the remaining proofs we refer to the specialized literature [2, 3]. 
According to Corollary 13.4 the inclusion mapping W'?(2) > L4(Q) is con- 
tinuous for | < q < p*. We have to show that every bounded subset M Cc W*?(Q) 
is precompact in L’(§2) for 1 < q < p*. This is done by the Kolmogorov—Riesz 
compactness criterion. By Remark 13.3 only conditions (a) and (c) have to be verified 
for M considered as a subset of L4(§2). Since we know that this inclusion map is 
continuous, it follows that M is bounded in L4(§2) too and thus Condition (a) of the 
Kolmogorov-Riesz criterion is verified and we are left with verifying Condition (c). 


13.4 Embeddings of Sobolev Spaces 197 


Observe that for 1 < g < p* Hélder’s inequality implies 


* a 1 p*—q 
lull < lull Nell t= 7 (0, 1). 


pe * q p* 


Now let M Cc W!?(Q) be bounded; then this set is bounded in L”” ($2) and hence 
there is a constant C < oo such that for all u € M we have 


Ilullg = Cllully 


and it follows 


|tyu —Uullg <2C||tyu—ullf, Vue M (13.28) 


where we assume that for u € W!?(92) the translated element Tyu is extended by 
zero outside §2. Therefore, it suffices to verify condition (c) of Theorem 13.9 for the 
norm ||-||;. For i = 1,2,... introduce the sets 


Q; = {x € 2: d(x,dQ2) > 2/i}, 


where d(x, 0S2) denotes the distance of the point x from the boundary 092 of 22. 
Another application of Hélder’s inequality gives, for all u € M, 


1/p* le 
i |u(x)| dx < (/ |u(x)|?” ax) (/ ax) 
2\Q; Q\ 23 2\Q; 


y—- 


sll 
< |lull p [A\Q'-%* < Cul Q\Q;|'”* 


where Cy is a bound for M in L”’ (92). Given ¢ > 0 we can therefore find ip = ig(e) 
such that for i > io 


/ |u(x)| dx < e/4 
Q\Q; 


holds for all u € M. Extend u € M outside 22 by 0 to get 


7 u(x) x EM, 
u(x) = 
0, otherwise. 
For a fixed i > ig and y € R”, |y| < 1/i, we estimate 


[tu — uli = / |u(x + y) — u(x)| dx + u(x + y) — u(x)| dx 
25 Q2\Qi 


Q; 


< |u(x + y) — u(x)| dx + €/2 
Qi 


And the integral is estimated as follows (p’ denotes the Hélder conjugate exponent 


of p): 
ax= | 
Q; 


1 
=f i y- Vu(x + ty) dt 
82; 0 
A - a 
S ly [22]? Vallee ca.) Sly ]@al "CS lyll2Q)P"C 


Ad 
/ —u(x + ty) dt ax sipif |Vu(x)|dx 
9 dt 23; 


198 13 Sobolev Spaces 


It follows that there is rp > 0 such that ||t,u — ul|; < € for all |y| < ro. By estimate 
(13.28) we conclude that Condition (c) of Theorem 13.9 holds and therefore by this 
theorem M C W!?(Q) is precompact in L4(&). 


Remark 13.4 The general case of W?(92) with k > 1 follows from the following 
observation which can be proven similarly. 


For m > | and 5 > e — * > 0 the inclusion of W*?(Q) into W'-""-9(Q) is compact. 


13.5 Exercises 


1. Poincaré’s inequality for elements in WwW, (1), I = [a,b] finite interval 1 < p < 
oo: For u € wy?) prove |u(x)| < |x —a|!/?"||u'||, and conclude ||ul|.o < 
[Z|!/?"|Iu' Ip. 

2. Prove Part c) of Theorem 13.2, i.e., prove that C°°(IR”) is dense in the Sobolev 
space W*?(R") for 1 < p<ooandk =0,1,2,.... 

Hints: Use suitable cut-off functions and regularization and take the techniques 
of proof in Proposition 7.4 and Theorem 7.1 into account. 

3. Prove the statements in Remark 13.2. 

4. Prove: For solutions of the time-independent classical field equation 


— And(x) + m?6(x) = ¢??7"(x) 


in the Sobolev space H!(IR") Sobolev’s inequality states that the potential energy 
of the solution ||@||2, is estimated from above by the kinetic energy || V@|l2, for 
p = n/n—2 ifn > 3. Formulate and prove the corresponding statements for 
n=2andn=1. 


References 


Driver BK. Analysis tools with applications. Berlin: Springer; 2003. 

Adams RA. Sobolev spaces. Boston: Academic; 1975. 

Adams RA, Fournier JF. Sobolev spaces. Amsterdam: Academic; 2003. 

Meyers N, Serrin J. “H=W”. Proc Nat Acad Sci. 1964;51:1055—1056. 

Strichartz R. A note on Sobolev algebras. Proc Am Math Soc. 1971;29(1):205—207. 

Simpson H, Spector S. A product property of Sobolev spaces with application to elliptic 

estimates. Rend Sem Mat Univ Padova. 2012 October. 

7. Tao T. Lecture notes 3 for 254A. Tech Rep Department of Mathematics, UCLA, Los Angles; 
2010. 

8. Kolmogorov AN. Uber Kompaktheit der Funktionenmengen bei der Konvergenz im Mittel. 
Nachr Ges Wiss Gottingen. 1931;9:60-63. 

9. Riesz M. Sur les ensembles compacts de fonctions sommables. Acta Szeged Sect Math. 

1933;6:136-142. 


ad ae ae eo 


Part II 
Hilbert Space Operators 


Chapter 14 
Hilbert Spaces: A Brief Historical Introduction 


14.1 Survey: Hilbert Spaces 


The linear eigenvalue problem Au = Au in finite dimensional spaces was completely 
solved at the end of the nineteenth century. At the beginning of the twentieth century, 
the focus shifted to eigenvalue problems for certain linear partial differential operators 
of second order (e.g., Sturm—Liouville problems) and one realized quickly that these 
are eigenvalue problems in infinite dimensional spaces, which presented completely 
new properties and unexpected difficulties. 

In an attempt to use, by analogy, the insight gathered in the finite dimensional 
case, also in the infinite dimensional case, one started with the problem of expanding 
“arbitrary functions” in terms of systems of known functions according to the re- 
quirements of the problem under consideration, for instance exponential functions, 
Hermite functions, spherical functions, etc. The coefficients of such an expansion 
were viewed as the coordinates of the unknown function with respect to the given 
system of functions (V. Volterra, I. Fredholm, E. Schmidt). Clearly, in this context, 
many mathematical problems had to be faced, for instance: 


1. Which sequences of numbers can be interpreted as the sequence of coefficients 
of which functions? 

2. Which notion of convergence is suitable for such an expansion procedure? 

3. Which systems of functions, besides exponential and Hermite functions, can be 
used for such an expansion? 

4. Given a differential operator of the type mentioned above, how do we choose the 
system of functions for this expansion? 


Accordingly, we start our introduction into the theory of Hilbert spaces and their 
operators with some remarks on the history of this subject. The answers to the first 
two questions were given at the beginning of the twentieth century by D. Hilbert 
in his studies of linear integral equations. They became the paradigm for this type 
of problems. Hilbert suggested using the space £2(R) of all sequences x = (x;)jen 
of real numbers x; which are square summable and introduced new topological 
concepts, which turned out to be very important later. Soon afterward, E. Schmidt, 
M. Fréchet, and F. Riesz gave Hilbert’s theory a more geometrical form, which 


© Springer International Publishing Switzerland 2015 201 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_14 


202 14 Hilbert Spaces: A Brief Historical Introduction 


emphasized the analogy with the finite dimensional Euclidean spaces R” and C”, 
n = 1,2,.... This analogy is supported by the concept of an inner product or scalar 
product, which depends on the dimension of the space and provides the connection 
between the metric and geometric structures on the space. This is well known for 
Euclidean spaces, and one expects that the notions and results known from Euclidean 
space remain valid in general. Indeed, this turned out to be the case. We mention 
here the concepts of length, of angles, as well as orthogonality and results such as 
the theorem of Pythagoras, the theorem of diagonals, and Schwarz’ inequality. This 
will be discussed in the section on the geometry of Hilbert spaces. However we 
will follow more the axiomatic approach to the theory of Hilbert spaces which was 
developed later, mainly by J. von Neumann and F. Riesz. In this approach, a Hilbert 
space is defined as a vector space on which an inner product is defined in such a way 
that the space is complete with respect to the norm induced by the inner product. For 
details, see Chap. 15, “Inner product spaces and Hilbert spaces.” 

After the basic concepts of the theory of Hilbert spaces have been introduced, 
a systematic study of the consequences of the concept of orthogonality follows in 
the section on the geometry of Hilbert spaces. The main results are the “Projection 
Theorem” 16.1 and its major consequences. Here it is quite useful to keep the analogy 
with the Euclidean spaces in mind. Recall the direct orthogonal decomposition R”? = 
R? @R!, p+q =n. This decomposition has a direct counterpart in a general Hilbert 
space H and reads H = M ® M*+ where M is any closed linear subspace of 1 and 
M+ its “orthogonal complement.” 

A very important consequence of this decomposition is the characterization of the 
continuous linear functionals on a Hilbert space (Theorem 16.3 of Riesz—Fréchet). 
According to this theorem, a Hilbert space H and its topological dual space H (as the 
space of all continuous linear functionals on 7) are “isometrically anti-isomorphic.” 
Thus, in sharp contrast to the “duality theory” of a general complete normed space, 
the “duality theory” of a Hilbert space is nearly as simple as that of the Euclidean 
spaces. The reason is that the norm of a Hilbert space has a special form since it is 
defined by the inner product. 

The expansion problem mentioned above receives a comprehensive solution in the 
“theory of separable Hilbert spaces” which is based on the notions of an “orthonormal 
basis” and “Hilbert space basis” (Chap. 16, “Separable Hilbert spaces”). Certainly, 
in this context, it is important to have a characterization of an orthonormal basis and 
a method to construct such a basis (Gram—Schmidt orthonormalization procedure). 

Besides the sequence spaces ¢7(IK), K = R or C, examples of Hilbert spaces 
which are important for us, are the Lebesgue spaces L?(2, dx) and the Sobolev 
spaces H*(Q), k = 1,2,..., where @ is a closed or an open subset of a Euclidean 
space R",n = 1,2,....Forsome of the Lebesgue spaces, the problem of constructing 
an orthonormal basis is discussed in detail. It turns out that the system of exponential 
functions e,, 

inx 


1 
€n(x) = ——e™, xé€[0,27], neZ 
20 


14.1 Survey: Hilbert Spaces 203 


is an orthonormal basis of the Hilbert space H = L?({0, 2s), dx). This means that 
every “function” f € L?({0, 22]), dx) has an expansion with respect to these basis 
functions (Fourier expansion): 


2n 
Ff = yee Ch = (€ns f)2 = i} en(x) f (x) dx. 


neZ 


Here, naturally, the series converges with respect to the topology of the Hilbert space 
L?({0, 2s), dx). This shows that Fourier series can be dealt with in a simple and 
natural way in the theory of Hilbert spaces. 

Next we construct an orthonormal basis for several “weighted Lebesgue spaces” 
L?(, oe dx), for an interval J = [a,b] and a weight function p : I > R*. By 
specializing the interval and the weight function one thus obtains several well-known 
orthonormal systems of polynomials, namely the Hermite-, Laguerre- and Legendre 
polynomials. 

We proceed with some remarks related to the second question. For the Euclidean 
spaces IR” one has a characterization of compact sets which is simple and conve- 
nient in applications: A subset K C R"” is compact if, and only if, it is bounded 
and closed. However in an infinite dimensional Hilbert space, as for instance the 
sequence space ¢7(R), a closed and bounded subset is not necessarily compact, with 
respect to the “strong” or norm topology. This fact creates a number of new problems 
unknown in finite dimensional spaces. D. Hilbert had recognized this, and therefore 
he was looking for a weaker topology on the sequence space with respect to which 
the above convenient characterization of compact sets would still be valid. He in- 
troduced the “weak topology” and studied its main properties. We will discuss the 
basic topological concepts for this weak topology and their relation to the corre- 
sponding concepts for the strong topology. It turns out that a subset of a Hilbert 
space is “weakly bounded,” i.e., bounded with respect to the weak topology, if, and 
only if, it is “strongly bounded,” i.e., bounded with respect to the strong or norm 
topology. This important result is based on the fundamental “principle of uniform 
boundedness,” which is discussed in good detail in the Appendix C. An immediate 
important consequence of the equivalence of weakly and strongly bounded sets is 
that (strongly) bounded subsets of a Hilbert space are relatively sequentially compact 
for the weak topology and this implies sequential completeness of Hilbert spaces for 
the weak topology. 

After we have learned the basic facts about the geometrical and topological 
structure of Hilbert spaces we study mappings between Hilbert spaces which are 
compatible with the linear structure. These mappings are called “linear operators.” 
A linear operator is specified by a linear subspace D of a Hilbert space H and an 
assignment A which assigns to each point x in D a unique point Ax in a Hilbert 
space K’. This linear subspace D is called the “domain of the operator.” If K = H 
one speaks about a “linear operator in the Hilbert space H,” otherwise about a “linear 
operator from H into K.” In order to indicate explicitly the dependence of a linear 
operator on its domain, we write A= (D, A) for a linear operator with domain D 


204 14 Hilbert Spaces: A Brief Historical Introduction 


and assignment A. In this notation, it is evident that the same assignment on different 
linear subspaces D, and D> defines different linear operators. 

Observe that in the above definition of a linear operator, no continuity require- 
ments enter. If one takes also the topological structure of Hilbert spaces into account 
one is lead to the distinction of different classes of linear operators. Accordingly 
we discuss in Chap. 19 “Linear operators” the definition and the characterization 
of the following classes of linear operators: Bounded, unbounded, closed, closable, 
and densely defined operators; for densely defined linear operators one proves the 
existence of a unique “adjoint operator,’ which allows one to distinguish between 
the classes of “symmetric,” “essentially self-adjoint,’ and “self-adjoint” operators. 
In applications, for instance in quantum mechanics, it is often important to decide 
whether a given linear operator is self-adjoint or not. Thus some criteria for self- 
adjointness are presented and these are illustrated in a number of examples which 
are of interest in quantum mechanics. 

If for two linear operators A = (D;,A;), i = 1,2, one knows D, © Dy» and 
Aix = Aox for all x € Dj, one says that the linear operator Aga is an “extension” 
of the linear operator Ais respectively that A, is a “restriction” of A. A standard 
problem which occurs quite frequently is the following: Given a linear differential 
operator on a space of “smooth” functions, construct all self-adjoint extensions of 
this differential operator. Ideally one would like to prove that there is exactly one 
self-adjoint extension (which one then could call the natural self-adjoint extension). 
For the construction of self-adjoint extensions (for instance of a linear differential 
operator) one can often use the “method of quadratic forms” since there is a funda- 
mental result which states that “semi-bounded self-adjoint operators” and “closed 
semi-bounded densely defined quadratic forms” are in a one-to-one correspondence 
(see Representation Theorem 21.2 and 21.3 of T. Kato). The method of quadratic 
forms is also applied successfully to the definition of the sum of two unbounded 
self-adjoint operators, even in some cases when the intersection of the domains of 
the two operators is trivial, i.e. only contains the null vector. In this way one gets the 
“form sum” of two unbounded operators. 

Naturally, most of the problems addressed above do not occur for the class of 
“bounded” linear operators. Two bounded linear operators can be added in the stan- 
dard way since they are defined on the whole space, and they can be multiplied by 
scalars, i.e. by numbers in K. Furthermore one can define a product of two such 
operators by the composition for mappings. Thus it turns out that the class of all 
bounded linear operators on a Hilbert space H is an algebra B(H), in a natural way. 
This algebra B(H) has a number of additional properties which make it the standard 
example of a noncommutative “C*-algebra.” On B(H), we consider three different 
topologies, the “uniform’ or “operator-norm” topology, the “strong” topology, and 
the “weak” topology and look at the relations between these topologies. 

The algebra B(H) contains several important classes of bounded linear opera- 
tors. Thus we discuss the class of “projection operators” or “projectors,” the class 
of “isometries,” and the class of “unitary operators.” Projectors are in one-to-one 
correspondence with closed subspaces of the Hilbert space. Isometric operators be- 
tween two Hilbert spaces do not change the metric properties of these spaces. The 


14.1 Survey: Hilbert Spaces 205 


class of unitary operators can be considered as the class of those operators between 
Hilbert spaces which respect the linear, the metric, and the geometric structures. This 
can be expressed by saying that unitary operators are those bijective linear opera- 
tors which do not change the inner products. As we will learn there is an important 
connection between self-adjoint operators and “strongly continuous’ one-parameter 
groups of unitary operators U(t), t € IR: Such groups are “generated by self-adjoint 
operators,” in analogy to the unitary group of complex numbers z(t) = e “, t € R, 
which is “generated’ by the real number a. The unitary groups and their relation to 
self-adjoint operators play a very important role in quantum mechanics (time evolu- 
tions, symmetries). Another class of bounded linear operators are the “trace class” 
operators which are used in the form of “density matrices” in the description of states 
for a quantum mechanical system. As an important application we present here the 
“general uncertainty relations of Heisenberg.” 

In more concrete terms and in greater detail we will discuss the above concepts 
and results in the following section which is devoted to those self-adjoint operators 
which play a fundamental role in the description of quantum systems, i.e. position, 
momentum and energy or Hamilton operators. As in classical mechanics the Hamil- 
ton operator of an interacting system is the “sum” of the operator corresponding to 
the kinetic energy, the free Hamilton operator, and the operator describing the po- 
tential energy. Typically both operators are unbounded and we are here in a concrete 
situation of the problem of defining the “sum” of two unbounded self-adjoint oper- 
ators. A solution of this problem is due to T. Kato who suggested considering the 
potential operator or interaction energy as a certain perturbation of the free Hamil- 
ton operator (nowadays called “Kato perturbation”). In this way many self-adjoint 
Hamilton operators can be constructed which are of fundamental importance for 
quantum mechanics. 

The final sections of the part “Hilbert Spaces” come back to the class of problems 
from which the theory of Hilbert spaces originated, namely finding “eigenvalues” 
of linear operators in Hilbert spaces. It turns out that in infinite dimensional Hilbert 
spaces the concept of an eigenvalue is too narrow for the complexity of the problem. 
As the suitable generalization of the set of all eigenvalues of linear maps in the finite 
dimensional case to the infinite dimensional setting, the concept of “spectrum” is 
used. In an infinite dimensional Hilbert space the spectrum of a self-adjoint operator 
can have a much richer structure than in the finite dimensional situation where it 
equals the set of all eigenvalues: Besides “eigenvalues of finite multiplicity” there 
can be “eigenvalues of infinite multiplicity” and a “continuous part,” i.e. a nonempty 
open interval can be contained in the spectrum. Accordingly the spectrum of a linear 
operator is divided into two parts, the “discrete spectrum” and the “essential spec- 
trum.” H. Weyl found a powerful characterization of the discrete and the essential 
spectrum and he observed a remarkable stability of the essential spectrum under cer- 
tain perturbations of the operator: If the difference of the “resolvents” of two closed 
linear operators is a “compact operator,’ then both operators have the same essential 
spectrum. 

Recall the “spectral representation” of a symmetric n x n matrix. If o(A) = 
{A1,...,An} are the eigenvalues of A and {e),... ,e,} C R” the corresponding 


206 14 Hilbert Spaces: A Brief Historical Introduction 


orthonormal eigenvectors, the matrix A has the spectral representation 


A= )) AP = DlAjlesies 
j=l 


de0(A) 


where P,,, = |e;)(e;| is the orthogonal projector onto the space spanned by the 
eigenvector ej, i.e. P,,x = (e;,x)e; for all x € K”. 

For a self-adjoint operator in an infinite dimensional Hilbert space one must 
take into account that the operator might have a nonempty continuous spectrum 
and accordingly the general version of the spectral representation of a self-adjoint 
operator A should be, in analogy with the finite dimensional case, 


A= AdP,. (14.1) 
o(A) 


The proof of the validity of such a spectral representation for general self-adjoint 
operators needs a number of preparations which we will give in considerable detail. 
The proof of the spectral representation which we present has the advantage that 
it relies completely on Hilbert space intrinsic concepts and methods, namely the 
“geometric characterization of self-adjointness.” This approach has the additional 
advantage that it allows us to prove the fact that every closed symmetric operator has 
a “maximal self-adjoint part,” without any additional effort. 

Early results in the “spectral theory” of self-adjoint operators concentrated on 
the case where the operator is “compact.” Such operators do not have a continuous 
spectrum. We discuss here briefly the main results in this area, the “Riesz—Schauder 
theory” including the “Fredholm alternative” and several examples. 

The spectral representation of a self-adjoint operator A (14.1) has many applica- 
tions some of which we discuss in detail, others we just mention briefly. From the 
point of view of applications to quantum mechanics the following consequences are 
very important. Starting from the spectral representation (14.1) the classification of 
the different parts of the spectrum o(A) of the operator A can be done in terms of 
properties of the measures 


dmy (A) = diy, Pi), peH 


relative to the Lebesgue measure d 4. Here the most important distinction is whether 
the measure dm,, is absolutely continuous with respect to the Lebesgue measure 
or not. In this way one gets a decomposition of the Hilbert space 1 into different 
“spectral subspaces.” This spectral decomposition plays an important role in the 
“scattering theory” for self-adjoint “Schrodinger operators” H = Ho + V in the 
Hilbert space H = L?(R°), for instance. According to physical intuition one expects 
that every state of such a system is either a “bound state,” i.e. stays essentially 
localized in a bounded region of R°, or a “scattering state,” i.e. a state which “escapes 
to infinity.” The finer spectral analysis shows that this expectation is not always 
correct. The final section of this part discusses when precisely this statement is 


14.1 Survey: Hilbert Spaces 207 


correct and how it is related to the different spectral subspaces of the Schrédinger 
operator H. 

As we will learn the spectrum of a self-adjoint operator consists not only of 
eigenvalues but often also has a continuous part. This means that to some parts of the 
spectrum there are no eigenfunctions in the Hilbert space and thus the subspace gen- 
erated by all eigenfunctions is a proper subspace of the Hilbert space, i.e. the system 
of eigenfunctions is not complete. However in many applications of spectral theory, 
also in quantum physics, it is desirable to have a complete set of eigenfunctions. This 
requirement let to the concept of “generalized eigenfunctions” (in the sense of dis- 
tribution theory). Chapter 29 “Spectral Analysis in rigged Hilbert spaces” develops 
the appropriate framework in which one can prove the existence and completeness 
of generalized eigenfunctions. This result is used very successfully in Dirac’s bra 
and ket formalism. 

In the theory of operator algebras and in quantum physics the concept of a positive 
map plays a fundamental réle. Accordingly we devote a chapter to the structural 
analysis of positive mappings. The most prominent results are Naimark’s theorem on 
representations of the C*-algebra of all bounded linear operators on a Hilbert space, 
the Gelfand—Naimark—Segal construction for positive normalized functionals, the 
characterization of normal states, and the structural analysis of completely positive 
mappings (Stinespring’s factorization theorem). 

Though the concept of complete positivity was introduced in the context of a 
mathematical theory it is of fundamental importance in those parts of quantum theory 
which study composite quantum systems, in particular in the context of quantum 
operations and entangled states (see Chaps. 30 and 31). The technical reason is that 
the usual positivity condition for operators is not preserved under the formation of 
tensor products (if A; are positive bounded linear operators on the Hilbert space H ; 
then the operator A; ® Az on H; ® Hz is in general not positive). The definition 
of complete positivity (Definition 30.6) involves the formation of tensor products 
of operators in such a way that one could say that a linear map of an operator 
algebra A is complete positive whenever it is positive as a map of the much larger 
algebra M;(C) ® A for any k = 1,2,... where M;(C) is the algebra of all complex 
k x k-matrices. 

The last chapter of this part applies the results of the structural analysis of pos- 
itive mappings to three important problems of quantum physics. The general form 
countable additive probability measures on the lattice of projections of a separable 
Hilbert space of dimension greater than two is determined in Gleason’s theorem. 
Then we present the general mathematical form of quantum operations (i.e. com- 
pletely positive linear mappings of the space of trace class operators into itself which 
do not increase the trace) in Kraus’ first representation theorem. In finite dimensional 
spaces the structural analysis of completely positive mappings allows some stronger 
statements then Stinespring’s factorization theorem; these results are due to Choi 
and have many important applications in quantum information theory. 


208 14 Hilbert Spaces: A Brief Historical Introduction 


14.2 Some Historical Remarks 


We sketch a few facts which led to the development of the theory of Hilbert spaces. 
For those readers who are interested in further details of the history of this theory and 
of functional analysis in general we recommend the book [1]. As mentioned above 
the theory of Hilbert spaces has its origin in the theory of expansion of arbitrary 
functions with respect to certain systems of orthogonal functions (with respect to a 
given inner product). Such systems of orthogonal functions usually were systems of 
eigenfunctions of certain linear differential operators. One can view these expansions 
as an infinite dimensional version of the Pythagorean theorem. 

In the second half of the nineteenth century, under the influence of mathematical 
physics, the focus of much research was on the linear partial differential equation 


A3u(x) +Au(x)=0 VxER, ulag@=0, (14.2) 


where 2 C R? is a nonempty domain with smooth boundary and where A; is the 
Laplace operator in three dimensions. In this context the concept of Green’s function 
or elementary solution was introduced by Schwarz, as a predecessor of the concept of 
elementary solution as introduced and discussed in the the first part on distribution 
theory (Sect. 8.4). Around 1894, H. Poincaré proved the existence and the main 
properties of the eigenfunctions of the eigenvalue problem (14.2). 

As we will learn later, these results are closely related to the emergence of the 
theory of linear integral equations, i.e. equations of the form 


b 
ie: i K(x, yu(y) dy = f(2), (14,3) 


in the case of one dimension, for an unknown function u, for a given kernel function 
K and a given source term /. And this theory of linear integral equations in turn 
played a decisive role in the development of those ideas which shaped functional 
analysis, as we know it today. Many well-known mathematicians of that period, e.g. 
C. Neumann, H. Poincaré, I. Fredholm, V. Volterra, and E. Schmidt studied this type 
of equations and obtained many interesting results. Eventually, at the beginning of 
the twentieth century, D. Hilbert introduced a good number of new and very fruitful 
ideas. In his famous papers of 1906, he showed that solving the integral Eq. (14.3) is 
equivalent, under certain conditions on K and f, to solving the infinite linear system 
for the unknown real sequence u;,i = 1,2,..., for a given infinite matrix with real 
coefficients K;; and a given real sequence f;: 


uj +) Kijuj = fi i=1,2,.... (14.4) 


Furthermore he succeeded in showing that the only relevant solutions of this system 
are those which satisfy the condition 


yw <0. (14.5) 


j=l 


14.2 Some Historical Remarks 209 


The set of all real sequences (u;);-n satisfying condition (14.5), i.e. the set of all 
square summable real sequences, is denoted by (IR). We will learn later that it is a 
real vector space with an inner product so that this space is complete with respect to 
the norm defined by this inner product. Thus ¢7(R) is an example of a Hilbert space. 
Naturally one would expect that this space plays a prominent role in the theory of 
Hilbert spaces and this expectation will be confirmed later when we learn that every 
separable Hilbert space is isomorphic to €7(R) or £7(C). 

All the Euclidean spaces R", n = 1,2,..., are naturally embedded into £2(R) 
by assigning to the point x = (%1,...,%,) € R” the sequence whose components 
with index i > n all vanish. In this sense we can consider the space £7(R) as the 
natural generalization of the Euclidean space R” to the case of infinite dimensions. 
On the space ¢7(R), D. Hilbert introduced two important notions of convergence 
which are known today as strong and weak convergence. These will be studied 
later in considerable detail. These two notions of convergence correspond to two 
different topologies on this infinite dimensional vector space. Linear mappings, linear 
functionals and bilinear forms were classified and studied by Hilbert on the basis of 
their continuity with respect to these two topologies. In such a space the meaning 
and interpretation of many concepts of Euclidean geometry were preserved. This 
is the case in particular for theory of diagonalization of quadratic forms which is 
well established in Euclidean spaces. Hilbert proved that also in the space ¢7(R) 
every quadratic form can be given a normal (i.e. diagonal) form by a “rotation of the 
coordinate system.” In his theory of diagonalization of quadratic forms in the infinite 
dimensional case, Hilbert discovered a number of new mathematical structures, e.g. 
the possibility of a “continuous spectrum.” 

Hilbert’s new theory was of great importance for the emerging quantum mechanics 
since it offered, through Hilbert’s new concept of a “mathematical spectrum,” the 
possibility of interpreting and understanding the energy spectra of atoms as they were 
observed experimentally. Since then the theory of Hilbert spaces grew enormously, 
mainly through its interaction with quantum physics. 

The next important step in the development of the theory of Hilbert spaces came 
through the ideas of M. Fréchet, E. Schmidt, and F. Riesz who introduced in the 
years 1907-1908 the concepts of Euclidean geometry (length, angle, orthogonality, 
basis, etc.) to the theory of Hilbert spaces. A remarkable early observation in these 
studies by F. Riesz and M. Fréchet was the following: The Lebesgue space L?(R) 
of all equivalence classes of square integrable functions on R has a very similar 
geometry to the Hilbert space €7(IR). Several months later the analogy between the 
two spaces L?(R) and £7(R) was established completely when F. Riesz and E. Fischer 
proved the completeness of the space L?(IR) and the isomorphy of these spaces. Soon 
one realized that many classical function spaces were also isomorphic to £7(IR). Thus 
most of the important properties of Hilbert spaces were already known at that period. 
Some of the most important contributions to this theory are contained in [2] and [3]. 

Later, around 1920, the abstract and axiomatic presentation of the theory of Hilbert 
spaces emerged, mainly through the efforts of J. von Neumann [4] and R. Riesz who 
also started major developments of the theory of linear operators on Hilbert spaces. 


210 14 Hilbert Spaces: A Brief Historical Introduction 


Certainly there are many other interesting aspects of history of the theory of Hilbert 
spaces and their operators. These are addressed, for instance in J. Dieudonné’s book 
“History of Functional analysis” [1] which we highly recommend, together with the 
book [5]. 


14.3 Hilbert Spaces and Physics 


The intuitive steps leading to the recognition that the mathematical structure (Hilbert 
spaces, involutive algebras, representation theory of groups) offers the key to quantum 
theory appear to me as a stiking corroboration of Einstein’s emphasis of free creation of the 
mind and Dirac’s conviction that beauty and simplicity provide guidance. 

Rudolf Haag 


In our context Physics refers for the most part to “Quantum Physics.” In quantum 
physics a system, for instance a particle or several particles in some force field, is 
described in terms of “states’, “observables,” and “expectation values.” States are 
given in terms of vectors in a Hilbert space, more precisely in terms of “unit rays’ 
generated by a nonvanishing vector in a Hilbert space. The set of all states of a system 
is called the state space. Observables are realized by self-adjoint operators in this 
Hilbert space while expectation values are calculated in terms of the inner product 
of the Hilbert space. 

In quantum physics, a particle is considered as an object which is localizable in 
(physical) space, i.e. in the Euclidean space R°. Its state space is the Hilbert space 
L?(R3). The motivation for this choice is as follows. If the particle is in the state 
given by w € L7(IR°), the quantity |y(x)|? has the interpretation of the probability 
density of finding the particle at the point x € R* when a measurement is performed. 
This interpretation which is due to M. Born obviously requires 


/ I(x)? d3x = 1. 
R3 


Thus the choice of L?(IR+) as the state space of one localizable particle is consistent 
with the probability interpretation of the “wave function’ yw. Observables are then 
self-adjoint operators in L?(R°) and the expectation value E'4(y) of an observable 
described by the self-adjoint operator A when the particle is in the state yr € L7(R*) 


18 
(w, Ay) 
=i = AW), 
E(w) we) (w, Ap) 


for normalized y. The self-adjoint operators of quantum mechanics are typically un- 
bounded and thus not continuous. Therefore Hilbert’s original version of the theory 
of Hilbert spaces and their operators could not cope with many important aspects and 
problems arising in quantum mechanics. Thus, in order to provide quantum mechan- 
ics with a precise mathematical framework, R. Riesz, M. H. Stone, and in particular 
J. von Neumann developed around 1930 an axiomatic approach to the theory of 


References 211 


Hilbert spaces and their operators. While in Hilbert’s understanding quadratic forms 
(or operators) were given in terms of concrete quantities, J. von Neumann defined 
this concept abstractly, i.e. in terms of precise mathematical relation to previously 
defined concepts. This step in abstraction allowed him to overcome the limitation of 
Hilbert’s original theory and it enabled this abstract theory of Hilbert spaces to cope 
with all mathematical demands from quantum physics. A more recent example of 
the successful use of operator methods in quantum mechanics is the book [6]. 

Earlier we presented L. Schwartz’ theory of distributions as part of modern func- 
tional analysis, i.e. the unification in terms of concepts and methods of linear algebra 
and analysis. It is worthwhile mentioning here that the deep results of D. Hilbert, 
F. and R. Riesz, M. Fréchet, E. Fischer, J. von Neumann, and E. Schmidt were 
historically the starting point of modern functional analysis. 

Now we recall several applications of the theory of Hilbert spaces in classical 
physics. There this theory is used mainly in the form of Hilbert spaces L*(Q) of 
square integrable functions on some measurable set 2 C R",n = 1,2,3,.... If for 
instance 2 is some interval in time and if | f (t)|? At denotes the energy radiating off 
some system, the total energy which is radiated off the system during this period in 
time, is 


e= | iforar= tdiore 


In such a context physicists prefer to call the square integrable functions the “func- 
tions with finite total energy.” Theorem 10. of Parseval—Plancherel states for this 


case 
irora = fiforren 


where ra denotes the Fourier transform of the function f. In this way one has two 
equivalent expressions for the total energy of the system. The quantity | f(v)|? has 
naturally the interpretation of the radiated energy during a unit interval in frequency 
space. The second integral in the above equation thus corresponds to a decomposition 
into harmonic components. It says that the total energy is the sum of the energies of 
all its harmonic components. This important result which is easily derived from the 
theory of the L? spaces was originally proposed by the physicist Lord Rayleigh. 

The conceptual and technical aspects of the development of quantum theory are 
well documented in the book [7] of M. Jammer. A quite comprehensive account of 
the development of quantum theory can be found in the six volumes of Mehra and 
Rechenberg [8]. 


References 


1. Dieudonné JA. Foundations of modern analysis. New York: Academic Press; 1969. 
2. Schmidt E. Zur Theorie der linearen und nichtlinearen Integralgleichungen. I. Teil: Enywicklung 
willktirlicher Funktionen nach Systemen vorgeschriebener. Math Ann. 1907;LXIII:433-76. 


212 14 Hilbert Spaces: A Brief Historical Introduction 


3. Hilbert D. Grundziige einer allgemeinen Theorie der Integralgleichungen. Leipzig: B. G. 
Teubner; 1924. 

4. von Neumann J. Mathematical foundations of quantum mechanics. Investigations in physics. 
2nd print, editor. Vol. 2. Princeton: Princeton University. Press; 1967. 

5. Kramer EE. Chapter 23: Royal roads to functional analysis. pp. 550-76. Chapter 26: The Leonar- 
dos of modern mathematics. pp. 625-38. In: The nature and growth of modern mathematics. 
Princeton: Princeton University Press; 1982. . 

6. Schechter M. Operator methods in quantum mechanics. New York: North Holland; 1981. 

Jammer M. The conceptual development of quantum mechanics. New York: Wiley; 1974. 

8. Mehra J, Rechenberg H. The historical development of quantum theory. Vol. 1—- 6. New York: 
Springer-Verlag; 2001. 


a 


Chapter 15 
Inner Product Spaces and Hilbert Spaces 


In close analogy with the Euclidean spaces, we develop in this short chapter the 
basis of the theory of inner product spaces or “pre-Hilbert spaces” and of “Hilbert 
spaces.” Recall that a Euclidean space is a finite-dimensional real or complex vector 
space equipped with an inner product (also called a scalar product). In the theory of 
Euclidean space we have the important concepts of the length of a vector, of orthogo- 
nality between two vectors, of an orthonormal basis, etc. Through the inner product it 
is straightforward to introduce these concepts in the infinite dimensional case too. In 
particular we will learn in a later chapter that, and how, a Hilbert space can be identi- 
fied with its topological dual space. This, together with the fact that Hilbert spaces can 
be considered as the natural extension of the concept of a Euclidean space to the infi- 
nite dimensional situation, gives Hilbert spaces a distinguished role in mathematical 
physics, in particular in quantum physics, and in functional analysis in general. 


15.1 Inner Product Spaces 


Before we turn our attention to the definition of abstract inner product spaces and 
Hilbert spaces we recall some basic facts about Euclidean spaces. We hope that thus 
the reader gets some intuitive understanding of Hilbert spaces. 

The distinguishing geometrical properties of the three-dimensional Euclidean 
space R? is the existence of the concept of the “angle between two vectors” of this 
space, which has a concrete meaning. As is well known, this can be expressed in terms 
of the inner product of this space. For x = (x1, X2, x3) € R? and y = (1, y2, Y3) € R? 


one defines 
3 
(x,y) = Sox: 
j=l 
Then 
Ix |] = + (x, x) 
© Springer International Publishing Switzerland 2015 213 


P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_15 


214 15 Inner Product Spaces and Hilbert Spaces 


is the Eucidean length of the vector x € R*, and the angle 6 between two vectors 
x,y € R? is determined by the equation 


(x,y) = [lxll Ilyll cos @. 


@ is unique in the interval [0, zr]. 
For the finite-dimensional Euclidean spaces R” and C”, n € N, the situation is 
very similar; with the inner product 


n 
iy) =>oxjy, xy eC" 
j=l 


the angle 6 between x, y is defined in the same way. 

Thus we have three fundamental concepts at our disposal in these spaces together 
with their characteristic relation: vectors (linear structure), length of vectors (metric 
structure), and angle between two vectors (geometric structure). 


15.1.1 Basic Definitions and Results 


The concept of an inner product space or pre-Hilbert space is obtained by abstraction, 
by disregarding the restriction in the dimension of the underlying vector space. As 
in the finite dimensional case, the metric and geometric structures are introduced 
through the concept of an “inner” or “scalar” product. 


Definition 15.1 For a vector space V over the field K (of complex or real numbers) 
every mapping 
(,:-): Vx VK 


is called an inner product or a scalar product if this mapping satisfies the following 
conditions: 


(P1) forall x € V: (x,x) > O and (x,x) = Oimplies x =O € V; 
(IP2) (x,y +z) = (x, y) + (x,z) forall x,y,z eV; 
(P3) (x,ay) = a(x, y) forallx, y € V andalla € K; 
(IP4) (x, y) = (y,x) forall x,y € V; 
Le., (-,-) is a positive definite sesquilinear form on V. 
A vector space equipped with an inner product is called an inner product space 
or a pre-Hilbert space. 
There is an immediate consequence of this definition: For all x, y,z € V and all 
a, B € K one has 


(x,ay + Bz) = a(x, y) + B(x, 2), 
(ax, y) = a(x, y). 


Note that in Definition 15.1 we have used the convention which is most popular 
among physicists in requiring that an inner product is linear in the second argument 


15.1 Inner Product Spaces 215 


while it is antilinear in the first argument. Among mathematicians, linearity in the 
first argument seems to be more popular. 
We recall two well-known examples of inner products: 


1) On the Euclidean space K” the standard inner product is 


for all x = (41,...,%n) € K" andall y = (y1,..., yn) € K". 

2) On the vector space V = C(/,K) of continuous functions on the interval J = 
[a, b] with values in K, the following formula defines an inner product as one 
easily proves: 


b 
(f.8)= | Ff@g(x)dx Vif.geV. 


As in the Euclidean spaces the concept of orthogonality can be defined in any inner 
product space. 


Definition 15.2 Suppose that (V, (-,-)) is an inner product space. One calls 


a) an element x € V orthogonal to an element y € V, denoted x_Ly, if, and only 
if, (x,y) = 0; 

b) asystem (%w)vea C V orthonormal or an orthonormal system if, and only if, 
(Xe,xg) = O fora ¥ B and (Xg,x,) = | for alla, B € A. Here A is any index 
Set; 

c) ||x|| = +/ (x, x) the length of the vector x € V. 


A simple and well-known example of an orthonormal system in the inner product 
space V = C(J,C), IJ = [0,27], mentioned above, is the system of functions f,,, 


n € Z, defined by f, (x) = Tz ei”*, x € J. By an elementary integration one finds 


1 2n : 
(fas fin) = =| ered = Onm- 
20 0 


In elementary geometry we learn the theorem of Pythagoras. The following lemma 


shows that this result holds in any inner product space. 


Lemma 15.1 (Theorem of Pythagoras) /f {x,,...,xy}, N € N, is an orthonormal 
system in an inner product space (V,(-,-)), then, for every x € V the following 


identity holds: 
N 


N 
xl? = So xn x)? + Mle — Dn x) nll? 
n=1 


n=1 


Proof Given any x € V introduce the vectors y = So (Xn, X)X, andz=x-—y. 
Now we calculate, for j € {1,..., N}: 


(xjz) = (xj.x — OM (xn x) an) = (x50) — OL tne x) (xj Xn) 


= (xj,%) — My (xn. x)5jn = 0. 


216 15 Inner Product Spaces and Hilbert Spaces 


It follows that (y,z) = 0. This shows that x = y + z is the decomposition of the 
vector x into a vector y which is contained in the space spanned by the orthonormal 
system and a vector z which is orthogonal to this space. This allows us to calculate 


Ix? = (y +2zy +2) = (yy) + (zy) + (yz) + (zz) = Iv ll? + Mzl?. 


And a straightforward calculation shows that (y,y) = er \(x,,x)|? and thus 
Pythagoras’ theorem follows. 

Pythagoras’ theorem has two immediate consequences which are used in many 
estimates. 


Corollary 15.1 


1. Bessel’s inequality: Jf {x,,..., xj} is a countable orthonormal system (i.e., N € 
N or N = +00) in a pre-Hilbert space V, then, for every x € V, the following 


estimate holds: 
N 


SiGe alr. 


n=1 


2. Schwarz’ inequality: For any two vectors x, y in a pre-Hilbert space V one has 


I(x, y)] < Ilall- My. 


Proof To prove the first part take L € N, L < N. Pythagoras’ theorem implies 


L 


Se >. exh? S lel. 


n=1 


Thus, for NV € N the first part follows. If N = +00 one observes that (S;)z is a 
monotone increasing sequence which is bounded by |x|”. Therefore this sequence 
converges to a number which is smaller than or equal to |x|*: 


(oe) 


Yo leax)? = lim Sp < |x)? 
L—-oo 


n=1 


which proves Bessel’s inequality in the second case. 
Schwarz’ inequality is an easy consequence of Bessel’s inequality. Take any two 
vectors x, y € V. If for instance x = 0, then (x, y) = (0, y) = 0 and ||x|| = 0, and 


Schwarz’ inequality holds in this case. If x 4 0, then ||x|| > 0 and thus {rat is an 
orthonormal system in V. Hence for any y € V Bessel’s inequality implies 


2. 


x 
2 
< lly’. 


Cane 


Ill 


Now Schwarz’ inequality follows easily. 


15.1 Inner Product Spaces 217 


Remark 15.1 


1. In the literature Schwarz’ inequality is often called Cauchy—Schwarz- 
Bunjakowski inequality. It generalizes the classical Cauchy inequality 


2 
n n 
Y > ajb; Or BUF Vaj, b; ER, Vn eN. 
j=l j=l 


2. Later in the section on the geometry of Hilbert spaces we will learn about 
a powerful generalization of Schwarz’ inequality, in the form of the “Gram 
determinants.” 

3. Suppose that (V, (-,-)) is a real inner product space and suppose that x,y € V 
are two nonzero vectors. Then Schwarz’ inequality says 


-l< Mey) <+i. 
lll Ly 


It follows that in the interval [0, 2] there is exactly one number 0 = 0(x, y) such 
that cos@ = TaIae This number @ is called the angle between the vectors x 
and y. 


Finally we study the concept of /ength in a general inner product space, in analogy 
to the Euclidean spaces. 


Proposition 15.1 If (V, (-,-)) is a pre-Hilbert space, then the function V > x 
|x| = +/(x,x) € Rt has the following properties: 


(NI) ||x|| = Oforallx € V; 

(N2) ||Ax|| = |A| |lx|| for all x € V and alld € K; 

(N3) ||x + yl] < Ilxll + llyll for all x, y € V (triangle inequality); 
(N4) ||x|| = 0 if, and only if x =O € V. 


This function || - || = /(-,-) is thus a norm on V; it is called the norm induced by 
the inner product. 


Proof It is a straightforward calculation to verify properties (N1), (N2), and (N3) 
using the basic properties of an inner product. This is done in the Exercises. Property 
(N3) follows from Schwarz’ inequality, as the following calculations show: 


lx + yl? = (x +y,x + y) = (x, x) + (yx) + (xy) + (yy) 
= [|x| + yl? + 29(x, y) 
< |x ll? + lly? + 21 (x, y)| 
< [lx ll? + yl? + 2 [xl yl = Cllall + lly, 


hence the triangle inequality (N3) follows. 


218 15 Inner Product Spaces and Hilbert Spaces 


15.1.2. Basic Topological Concepts 


Every inner product space (V, (-,-)) is a normed space under the induced norm 
\|-|| = /(,-) and thus all results from the theory of normed spaces apply which we 
have discussed in the first part. Here we recall some basic results. 

The system of neighborhoods of a point x € V is the system of all subsets of V 
which contain some open ball B,(x) = {y € V: ||y — x|| < r} with center x and 
radius r > 0. This system of neighborhoods defines the norm topology on V . For this 
topology one has: The addition (x, y) > x+y is acontinuous map V x V—> V. The 
scalar multiplication (A, x) H Ax isa continuous map K x V—> V. In the following 
definition we recall the basic concepts related to convergence of sequences with 
respect to the norm topology and express them explicitly in terms of the induced norm. 


Definition 15.3 Equip the inner product space (V, (-,-)) with its norm topology. 
One says: 


a) A sequence (X,)ncen C V is a Cauchy sequence if, and only if, for every ¢ > 0 
there is N € N such that ||x, — x|| < ¢ for alln,m > N; 

b) A sequence (X,)nen C V converges if, and only if, there is x € V such that for 
every € > 0 there is N € N such that ||x — x,|| < ¢ foralln > N; 

c) The inner product space (V,(-,-)) is complete if, and only if, every Cauchy 
sequence in V converges. 


Some immediate important consequences of these definitions are 
Corollary 15.2 


1. Every convergent sequence is a Cauchy sequence. 
2. Ifa sequence (Xn)nen converges to x € V, then 


lim. ||Xn || = [lll (15.1) 
n> Oo 


Proof The first part is obvious and is left as exercise. Concerning the proof of the 
second statement observe the basic estimate 


ldl<ll—IlyIDIi slaty VenyeV (15.2) 


which follows easily from the triangle inequality (see Exercises). 
Remark 15.2 


1. The axiom of completeness of the space of real numbers R plays a very important 
role in (real and complex) analysis. There are two equivalent ways to formulate 
the completeness of the set of real numbers. (a) Every nonempty subset M Cc R 
which is bounded from above (from below) has a supremum (an infimum). (b) 
Every Cauchy sequence of real numbers converges. The first characterization of 
completeness relies on the order structure of real numbers. Such an order is not 
available in general. Therefore, we have defined completeness of an inner product 
space in terms of convergence of Cauchy sequences. 


15.1 Inner Product Spaces 219 


2. Finite-dimensional and infinite-dimensional pre-Hilbert spaces differ in a very im- 
portant way: Every finite-dimensional pre-Hilbert space is complete, but there are 
infinite-dimensional pre-Hilbert spaces which are not complete. This is discussed 
in some detail in the Exercises. 

3. If a space is complete we can deal with convergence of a sequence without a 
priori knowledge of the limit. It suffices to verify that the sequence is a Cauchy 
sequence. Thus completeness often plays a decisive role in existence proofs . 

4. In Appendix A it is shown that every metric space can be “completed” by “adding” 

certain “limit elements.” This applies to pre-Hilbert spaces as well. We illustrate 
this here for the pre-Hilbert space Q of rational numbers with the ordinary product 
as inner product. In this case these limit elements are those real numbers which 
are not rational, and thus rather different from the original rational numbers. In 
this process of completion the (inner) product is extended by continuity to all real 
numbers. 
We mention another example illustrating the fact that these limit elements gener- 
ated in the process of completion are typically very different from the elements 
of the original space. If one completes the inner product space V of all poly- 
nomials on an interval J = [a,b], a,b € R, a < b, with the inner product 
(f,g) = i f(t)g(t)dt, one obtains the Lebesgue space L7(J, dt) whose 
elements differ in many ways from polynomials. 


One has to distinguish clearly the concepts complete and closed for a space. A 
topological space V which is not complete is closed, as is every topological space. 
However as part of its completion V, the space V is not closed. The closure of V in 
the space V is just V. This will be evident when we look at the construction of the 
completion of a metric space in some detail in Appendix A. 

The basic definition of this part is 


Definition 15.4 (J. von Neumann, 1925) A Hilbert space is an inner product space 
which is complete (with respect to its norm topology). 

Thus, in order to verify whether a given inner product space is a Hilbert space, 
one has to show that every Cauchy sequence in this space converges. Therefore, 
Hilbert spaces are examples of complete normed spaces, i.e., of Banach spaces. It is 
interesting and important to know when a given Banach space is actually a Hilbert 
space, i.e., when its norm is induced by an inner product. The following subsection 
addresses this question. 


15.1.3 On the Relation Between Normed Spaces and Inner 
Product spaces 


As we know, for instance from the example of Euclidean spaces, one can define on a 
vector space many different norms. The norm induced by the inner product satisfies 
a characteristic identity which is well known from elementary Euclidean geometry. 


220 15 Inner Product Spaces and Hilbert Spaces 


Lemma 15.2 (Parallelogram Law) In an inner product space (V, {-,+)) the 
norm induced by the inner product satisfies the identity 


lx + yl? + lx — yl? = Ix? + 2IyI?  Vx,y eV. (15.3) 


Proof The simple proof is done in the Exercises. 

The intuitive meaning of the parallelogram law is as in elementary geometry. To 
see this recall that x + y and x — y are the two diagonals of the parallelogram spanned 
by the vectors x and y. 

According to Lemma 15.2 the parallelogram law (15.3) is a necessary condition for 
a norm to be induced by a scalar product, 1.e., to be a Hilbertian norm. Naturally, not 
every norm satisfies the parallelogram law as the following simple example shows. 

Consider the vector space V = C((0,3], R) of continuous real functions on the 
interval J = [0,3]. We know that 


fll = sup | f@)! 


defines a norm on V. In the Exercises it is shown that there are functions f, g € V 
such that || f|| = | gl] = || f + ell = || f — g|| = 1. It follows that this norm does not 
satisfy (15.3). Hence this norm is not induced by an inner product. 

The following proposition shows that the parallelogram law is not only necessary 
but also sufficient for a norm to be a Hilbertian norm. 


Proposition 15.2 (Fréchet-von Neumann-Jordan) /f in a normed space (V, ||-||) 
the parallelogram law holds, then there is an inner product on V such that ||x ||? = 
(x,x) forallx € V. 

If V is a real vector space, then the inner product is defined by the polarization 
identity 


1 
Gy) = Ze + yl? =e yl) Vay VS (15.4) 
if V is a complex vector space the inner product is given by the polarization identity 


i 2 2 . . 2 . : 2 
ey) = f(a + yl — le — y+ elle + ty — tlle — FYI) Vx, y € V. 
(15.5) 


Proof The proof is left as an exercise. 

Without proof (see however [1, 2]) we mention two other criteria which ensure 
that a norm is actually a Hilbertian norm. Here we have to use some concepts which 
are only introduced in later sections. 


Proposition 15.3 (Kakutani, 1939) Suppose that (V,||-||) is a normed space of 
dimension > 3. If every subspace F C V of dimension 2 has a projector of norm 1, 
then the norm is Hilbertian. 


15.1 Inner Product Spaces 221 


Proposition 15.4 (de Figueiredo—Karlovitz, 1967) Let (V, (-,-)) be a normed 
space of dimension > 3; define a map T from V into the closed unit ball 
B={x eV: xl < by 


x if |[x|] <1, 
Tx = [ 
i if |[x|| = 1. 
If ||Tx — Ty|| < ||x — y|| for all x,y € V, then ||-|| is a Hilbertian norm. 
It is worthwhile to mention that in a normed space one always has ||Tx — Ty|| < 
2\|x — y|| for all x,y € V. In general the constant 2 in this estimate cannot be 
improved. 


15.1.4 Examples of Hilbert Spaces 


We discuss a number of concrete examples of Hilbert spaces which are used in many 
applications of the theory of Hilbert spaces. The generic notation for a Hilbert space 
is H. 


1. The Euclidean spaces: As mentioned before, the Euclidean spaces K” are Hilbert 
spaces when they are equipped with the inner product (x, y) = SS x;y; for all 
x,y € K”. Since vectors in K" have a finite number of components, completeness 
of the inner product space (IK”, (-,-)) follows easily from that of K. 

2. Matrix spaces: Denote by M,,(K) the set of all n x n matrices with coefficients 
in K, n = 2,3,.... Addition and scalar multiplication are defined component- 
wise. This gives M,,(IK) the structure of a vector space over the field K. In order 
to define an inner product on this vector space recall the definition of the trace 
of a matrix A € M,(IK). If Ai; € K are the components of A, the trace of A 
is defined as Tr A = ae Aj;;. The transpose of the complex conjugate matrix 


A is called the adjoint of A and denoted by A* = A. It is easy to show that 
(A, B) = Tr(A*B) = ae Ajj Bij, A, B € M, (IK), defines a scalar product. 
Again, completeness of this inner product space follows easily from completeness 
of K since matrices have a finite number of coefficients (see Exercises). 

3. The sequence space: Recall that the space ¢7(IK) of all sequences in K which 
are square summable was historically the starting point of the theory of Hilbert 
spaces. This space can be considered as the natural generalization of the Euclidean 
spaces IK” for n—>oo. Here we show that this set is a Hilbert space in the sense 
of the axiomatic definition given above. Again, addition and scalar multiplication 
are defined componentwise. If x = (Xn)nen and y = (yn)nen are elements in 
£2(IK), then the estimate Xn + Yn \ < 2(|Xn 7 + lyn 7) implies that the sequence 
xX + y = (X, + Yn)nen 1S Square summable too and thus addition is well defined. 
Similarly it follows that scalar multiplication is well defined and therefore ¢7(K) 
is a vector space over the field K. The estimate |x, y,| < 5(lXn P+ | Yn i) implies 


222 15 Inner Product Spaces and Hilbert Spaces 


that the series bie —1 Xn Yn converges absolutely for x, y € (IK) and thus can be 
used to define 


= omy (15.6) 


n=1 


as a candidate for an inner product on the vector space ¢7(K). In the Exercises 
we show that Eq. (15.6) defines indeed a scalar product on this space. Finally we 
show the completeness of the inner product space (€7(IK), (-,-)). 

Suppose that (x');en is a Cauchy sequence in this inner product space. Then each 
x' is a square summable sequence (x/),<n and for every ¢ > 0 there is an ig € N 
such that for alli, 7 > ig we have 


IIx? — xP = > lxé — xd |? < 2”. 


It follows that for every n € N the sequence (x!,);cn is actually a Cauchy sequence 
in K. Completeness of K implies that these Cauchy sequences converge, i.e., for 
all n € N the limits x, = lim, x/, exist in K. 

Next we prove that the sequence x = (x, )nen Of these limits is square summable. 
Given ¢ > 0 choose ig € N as in the basic Cauchy estimate above. Then, for all 
i, j => ig and for all m € N, 


m 


i j\2 i jy2 2. 
Yo ii — af? s Ix! — x4)? < 275 
n=1 


and we deduce, since limits can be taken in finite sums, 


m 


m 
dim SO [xi — xf)? = So lan — xf? < &? 
i—> 0 n=l 


n=1 


for all j > io and all m € N. Therefore, for each j > io, 8m = doy |Xn — Xn i)2 
is a monotone increasing sequence with respect to m € N which is bounded by 


e”. Hence this sequence has a limit, with the same upper bound: 


, J\2 _— . 2 
Xn — xX; —_— ea Sm < ey 


n=1 


ie., for each j > ig, we know ||x — x/|| < e. Since ||x|| = |[x — x/ + x/|| < 
Ix — x/ || 4 x7 |] < e+ ||x/]], for fixed j > ip, the sequence x belongs to £7(K) 
and the given Cauchy sequence (x');<j converges (with respect to the induced 
norm) to x. It follows that every Cauchy sequence in £7(IK) converges, thus this 
space is complete. 


Proposition 15.5 The space €7(IK) of square summable sequences is a Hilbert 
space. 


15.1 Inner Product Spaces 223 


4. The Lebesgue space: For this example we have to assume familiarity of the reader 
with the basic aspects of Lebesque’s integration theory. Here we concentrate on 
the Hilbert space aspects. 

Denote by £(R”) the set of Lebesgue measurable functions f : R’—>K which 
are square integrable, i.e., for which the Lebesgue integral 


/ [FO dx 
R" 


is finite. Since for almost all x € R” one has | f(x) + g(x)|* < 2(| f(x) 2 
|g(x)|*), it follows easily that £(R”) is a vector space over K. Similarly one 
has 2| f(x)g(x)| < | f(x)? + |e(x)|’, for almost all x € R"”, and therefore 
2 fon IF@)g@ldx < fon lf@)Pdx + fon lg@)/? dx, for all fg € L(R”). 
Thus a function £(R") x £(R") 3 (f,g) 6 (f,g)2= dah f(x)g(x)dx € Kis 
well defined. The basic rules for the Lebesgue integral imply that this function 
satisfies conditions (IP2) — (IP4) of Definition 15.1. It also satisfies (f, f)2 > 0 
for all f € CCR"). However (f, f)2 = 0 does not imply f = 0 € L(R"). 
Therefore one introduces the “kernel” NV = {f € LR"): (f, f)2 = 0} of (-,+)2 
which consists of all those functions in £(R") which vanish almost everywhere 
on R". As above it follows that VV is a vector space over K. Now introduce the 
quotient space 


L?(R") = L(R")/N 


with respect to this kernel which consists of all equivalence classes 
[fl=f+Nn, fe LR”). 


On this quotient space we define 


(Lf, [glo = (f,8)2 


where f,g € £(R”) are any representatives of their respective equivalence class. 
It is straightforward to show that now (-, -)2 is a scalar product on L?(IR”). Hence 
H = (L?(R"), (-,-)2) is an inner product space. That it is actually a Hilbert space 
follows from the important theorem 


Theorem 15.1 (Riesz—Fischer) The inner product space 
H = (L?(R"), (-,)2) 


is complete. 
Following tradition we identify the equivalence class [f] = f with its represen- 
tative in £(R") in the rest of the book. Similarly one introduces the Lebesgue 
spaces L*() for measurable subsets 2 C R” with nonempty interior. They too 
are Hilbert spaces. 

5. The Sobolev spaces: For an open nonempty set 2 C R” denote by W?(@) the 
space of all f € L?(Q) which have “weak” or distributional derivatives D® f 


224 15 Inner Product Spaces and Hilbert Spaces 


of all orders a, |a| < k, fork = 0,1,2,..., which again belong to L?(Q). 
Obviously one has 


Wers(2) C We(2) C+» CWH(RQ)=L(2), =k =0,1,2,.... 


On W?(2) the natural inner product is 


fae= 0 i Dif@gxydx V f.g € WP). 
la|<k 

It is fairly easy to verify that this function defines indeed a scalar product on the 
Sobolev space W7(2). Finally, completeness of the Hilbert space L7(2) implies 
completeness of the Sobolev spaces. Details of the proof are considered in the 
Exercises. 

Obviously these spaces W7(2) are the special case p = 2 of the class of spaces 
W;(@) introduced in Theorem 13.1. 


15.2. Exercises 


1. Let (V, (-,-)) be an inner product space. For x € V define ||x|| = +./(x, x) and 
show that V 5 x +> ||x|| isa norm on V. 
2. Give an example of a pre-Hilbert space which is not complete. 
Hint: Consider for instance the space V = C(J;R) of continuous real-valued 
functions on the unit interval 7 = [—1, 1] and equip this infinite-dimensional space 
with the inner product (f,g)2 = /f, , /(@)g(t) dt. Then show that the sequence 
(frnen in V defined by 
0 -1 <t<0O, 
Salt) = ynt 0 1, 
1 2 ; 


t<l 


is a Cauchy sequence in this inner product space which does not converge to an 
element in V. 
3. Show: For any two vectors a, b in a normed space (V, || - ||) one has 


+(lall — 51) < lla + | 


for any combination of the + signs. 

4. Prove the parallelogram law (15.3). 

5. On the vector space V = C((0, 3], R) of continuous real functions on the interval 
I = [0,3] define the norm 


fll = sup FOI 


and show that there are functions f, g € V such that || f|| = |lgl| = || f+ ell = 
\|.f — g|| = 1. It follows that this norm does not satisfy the identity (15.3). Hence 
this norm is not induced by an inner product. 


References 225 


Hint: Consider functions f, g € V with disjoint supports. 
6. Prove Proposition 15.2. 
7. Show that the space M,,(K) of n x n matrices with coefficients in K is a Hilbert 
space under the inner product (A, B) = Tr(A*B), A, B € M, (IK). 
Prove: Equation 15.6 defines an inner product on the sequence space ¢7(K). 
9. Prove that the Sobolev spaces W?(2), k €N, are Hilbert spaces. 
Hint: Use completeness of the Lebesgue space L?(). 


go 


References 


1. Kakutani S. Some characterizations of Euclidean spaces. Jpn J Math. 1939;16:93-7. 
2. de Figueiredo DG, Karlowitz L. On the radial projection in normed spaces. Bull Am Math Soc. 
1967;73:364-8. 


Chapter 16 
Geometry of Hilbert Spaces 


According to its definition a Hilbert space differs from a general Banach space in the 
important aspect that the norm is derived from an inner product. This inner product 
provides additional structure, mainly of geometric nature. This short chapter looks 
at basic and mostly elementary consequences of the presence of an inner product in 
a (pre-) Hilbert space. 


16.1 Orthogonal Complements and Projections 


In close analogy to the corresponding concepts in Euclidean spaces, the concepts 
of orthogonal complement and projections are introduced and basic properties are 
studied. This analogy helps to understand these results in the general infinite dimen- 
sional setting. Only very few additional difficulties occur in the infinite dimensional 
case as will become apparent later. 


Definition 16.1 For any subset M in a pre-Hilbert space (V, (-,-)) the orthogonal 
complement of // in V is defined as 


Mt={yeV: (y,x)=0 VxeM}. 


There are a number of elementary but important consequences of this definition. 


Lemma 16.1 Suppose that (V, (-,-)) is an inner product space. Then the following 
holds: 


1. Vt = {0} and {0}+ = V. 

2. For any two subset M C N C V one has N+ C M+. 

3. The orthogonal complement M+ of any subset M C V is a linear subspace of V. 
4. If0€ MCV, thenMOMt+ = {0}. 


The simple proof is done in the Exercises. 
The following definition and subsequent discussion take into account that linear 
subspaces are not necessarily closed in the infinite dimensional setting. 


© Springer International Publishing Switzerland 2015 227 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_16 


228 16 Geometry of Hilbert Spaces 


Definition 16.2 


1. A closed subspace of a Hilbert space H is a linear subspace of H which is closed. 
2. If M is any subset of a Hilbert space H. the span or linear hull of M is defined 
by 
linM = ea) apap xj €M, aj €K,neN 
j=l 


3. The closed subspace generated by a set / is the closure of the linear hull; it is 
denoted by [M], i.e., 
[M] = lin M. 


That these definitions, respectively notations, are consistent is the contents of the 
next lemma. 


Lemma 16.2. Fora subset M ina Hilbert space H the following holds: 


1. The linear hull lin M is the smallest linear subspace of H which contains M. 

2. The closure of a linear subspace is again a linear subspace. 

3. The orthogonal complement M~ of a subset M is a closed subspace. 

4. The orthogonal complement of a subset M and the orthogonal complement of the 
closed subspace generated by M are the same: 


Mt =[M}". 


Proof The proof of the first two items is left as an exercise. 

For the proof of the third point observe first that according to Lemma 16.1 M* is 
a linear subspace. In order to show that this linear subspace is closed, take any y € H 
in the closure of M+. Then there is a sequence (y,)nen C M+ which converges to y. 
Therefore, for any x € M, we know (y, x) = limy_,o0(¥y, x), because of continuity 
of the inner product. y, € M+ implies (y,,x) = 0 and thus (y, x) = 0. We conclude 
that y ¢ M+. This proves M+ Cc M* and thus Mt = M+ is closed and the third 
point follows. 

In a first step of the proof of the fourth part we show: M+ = (lin M)+. Since 
M C lin M the first part of Lemma 16.1 proves (lin M)+ C M+. If now y € M+ and 
x =), ajx; € lin M are given, it follows that (y, x) = 04, a;(y, xj) = Osince 
all x; belong to M, thus y ¢ (lin M )+ and therefore the equality M + = (lin M)*. 

Since M C [M] we know by the first part of Lemma 16.1 that [M@ ]¢ is contained 
in M*. In order to show the converse M+ C [M]+, take any y € M+. Every point 
x € [M] can be represented as a limit x = limj_,.. x; of points x; € lin M. Since 
we know M+ = (lin M)* we deduce as above that (y,x) = limj.0(y,x;) = 0. 
This proves y € [M]+ and we conclude. 

From elementary Euclidean geometry we know that given any line in R?, i.e., aone 
dimensional subspace of R*, we can write any vectorx € IR? inprecisely one way asa 
sum of two vectors where one vector is the projection of x onto this line and the other 
vector is perpendicular to it. A similar statements holds if a two dimensional plane is 


16.1 Orthogonal Complements and Projections 229 
given in R?. The following important result extends this orthogonal decomposition 
to any Hilbert space. 


Theorem 16.1 (Projection Theorem) Suppose M is a closed subspace of a Hilbert 
space H. Every vector x € H has the unique representation 


x=ut+y, u=u(x)e M, v=vwx)ye M+, 


and one has 
Iv || = inf |x — yl| = dQ, M) 
yeM 


where d(x,M) denotes the distance of the vector x from the subspace M. Or 
equivalently, 
H=M@Mt, 


ie., the Hilbert space H is the direct orthogonal sum of the closed subspace M and 
its orthogonal complement M+. 


Proof Given any x € H we have, foru € M, 
[|x — ull = inf lx — vl] = d(x, M). 
There is a sequence (Uy, )ynen C M such that 
d = d(x, M) = lim ||x — u, |]. 


The parallelogram law (Lemma 15.2) implies that this sequence is actually a Cauchy 
sequence as the following calculation shows: 


|| — Un ||? = (Un — x) +(x — Um )II? 
= 2||Un = |? oF 2\|x Um || (Un x)—(x Um) ||? 


= 2 lun — x ||? + 2l]x — uml|? — 4153 + Um) — XII. 


Since u,, and u,, belong to M their convex combination $(Un + um) is an element of 
M too and thus, by definition of d, || + (Un + Um) — xX \| > d*. It follows that 


2 2 2 2 
OS |lun — uml” S 2 lun — x1" + 2 [x — uml" — 44° > nmoo 9. 


Hence (u»)ne~ C M is a Cauchy sequence in the Hilbert space H and thus converges 
to a unique element u €¢ M = M since M is closed. By construction one has 


d= lim |x — wall = [lx — ull. 
n—->oo 


Next we show that the element v = x — u belongs to the orthogonal complement 


of M. For y € M, y # 0, introduce a = ae For arbitrary z € M, we know 


z—ay € M and thus in particular d? < ||x —(u — ay)||? = |v toy]? = vl]? + 


la|7\|y ||? + (v,ay) + (ay,v), hence d? < d* — ww)” and this estimate implies 
, eee = iy? P 


230 16 Geometry of Hilbert Spaces 


(y,v) = 0. Since this argument applies to every y € M, y # 0, we deduce that v 
belongs to the orthogonal complement M+ of M. 

Finally, we show uniqueness of the decomposition of elements x € H into a 
component u parallel to the closed subspace M and a component v orthogonal to it. 
Assume that x € H has two such decompositions: 


xX=Uut+v=wm+Vw, ujeM, v,;eM+t, i=1,2. 


It follows that uw; — uy = v2 -v) Ee MN M". By part 4 of Lemma 16.1 we conclude 
uy — Uz = V2 — Vy =O EH, hence this decomposition is unique. 

Recall that in the Euclidean space R? the shortest distance between a point x and 
a line M is given by the distance between x and the point wu on the line M which is 
the intersection of the line M and the line perpendicular to M, through the point x. 
The projection theorem says that this result holds in any Hilbert space and for any 
closed linear subspace M. 

As an easy consequence of the projection theorem one obtains a detailed de- 
scription of the bi-orthogonal complement M++ of a set M which is defined as the 
orthogonal complement of the orthogonal complement of M, i.e. Mt+ = (M+). 


Corollary 16.1. For any subset M in a Hilbert space H one has 


M++=[M] and Mtt+ = (Mtl) =m". (16.1) 


In particular, if M is a linear subspace, M+ = M, and if M is a closed linear 
subspace M++ = M. 


Proof Obviously one has M Cc M*++. By Lemma 16.2 the bi-orthogonal comple- 
ment of a set M is known to be a closed linear subspace of 1, hence the closed linear 
hull [M] of M is contained in M++:[M] Cc M+. 

Given any x € M++ there are u € [M] and v € [M]* such that x = u4+ v 
(projection theorem). Since x —u €¢ M+ —[M] C M++ and[M}+ = M+ (Lemma 
16.2), it follows that v = x —u € M‘+~ M?. But this intersection is trivial by 
Lemma 16.1, therefore x = u € [M]; this proves M+ C [M] and together with 
the opposite inclusion shown above, M++ = [M]. In order to show the second part 
we take the orthogonal complement of the identity we have just shown and apply 
Lemma 16.2 to conclude Mt++ = [M}t = Mt. 


Remark 16.1 Naturally one can ask whether the assumptions in the projection the- 
orem can be weakened. By considering examples we see that this is not possible in 
the case of infinite dimensional spaces. 


a) For a linear subspace of an infinite dimensional Hilbert space which is not closed 
or for a closed linear subspace of an infinite dimensional inner product space 
which is not complete, one can construct examples which show that in these 
cases the projection theorem does not hold (see Exercises). 

b) The projection theorem also does not hold for closed linear subspaces of a Banach 
space which are not Hilbert spaces. In these cases, the uniqueness statement in 
the projection theorem is not assured (see Exercises). 


16.2 Gram Determinants 231 


There is, however, a direction in which the projection theorem can be generalized. 
One is allowed to replace the closed linear subspace M by a closed convex set M. 
Recall that a subset M of a vector space is called convex if all the convex combinations 
Ax + (1 —A)y, 0 < A < 1, belong to M whenever x, y do. Thus one arrives at the 
projection theorem for closed convex sets. According to the methods used for its 
proof we present this result in Part C on variational methods, Theorem 36.1. 

Recall that a subset D of a Hilbert space H is called dense in H if every open ball 
B,(x) C H has a nonempty intersection with D, i.e., if the closure D of D is equal 
to H. Closely related to dense subsets are the “total” subsets, i.e., those sets whose 
linear hull is dense. They play an important role in the study of linear functions. The 
formal definition reads: 


Definition 16.3 A subset M of a Hilbert space H is called total if, and only if, the 
closed linear hull of M equals H, i.e., in the notation introduced earlier, if, and only 
if, [M] =H. 

The results on orthogonal complements and their relation to the closed linear hull 
give a very convenient and much used characterization of total sets. 


Corollary 16.2 A subset M of a Hilbert space H is total if, and only if, the 
orthogonal complement of M is trivial: M+ = {0}. 


Proof If M is total, then [M] = H and thus [M]+ = H+ = {0}. Lemma 16.2 
implies M+ = {0}. 

If conversely M+ = {0}, then M14 = {0}+ = H. But by Corollary 16.1 we 
know [M] = M+. Thus we conclude. 


16.2 Gram Determinants 


If we are given a closed subspace M of a Hilbert space H and a point x € H there is, 
according to the projection theorem, a unique element of best approximation u € M, 
i.e., an element u € M such that ||x — u|| is minimal. In concrete applications, one 
often has to calculate this element of best approximation explicitly. In general, this 
is a rather difficult task. However, if M is a finite dimensional subspace of a Hilbert 
space, there is a fairly simple solution to this problem, based on the concept of Gram 
determinants. 

Suppose that M is a subspace of dimension n and with basis {x;,... ,x,}. The 
projection theorem implies: Given x € H there are a unique u = u(x) € M anda 
unique v = v(x) € M+ such that x = u+v. u € M has a unique representation 
in terms of the elements of the basis: u = Ayx; +--+ +AnXn, Aj € K. Since 
v=x—ueé M+ weknow fork = 1,...,n that (xz, x — u) = O or (xg, x) = (xg, U). 
Inserting the above representation of u € M we get a linear system for the unknown 
coefficients A1,...,An! 


Dee) eee ee (16.2) 
j=l 


232 16 Geometry of Hilbert Spaces 


The determinant of this linear system is called the Gram determinant. It is defined 
in terms of the inner products of basis elements x1, ...,X,! 


(X1,%1)  (X1,X2) +++ (X1, Xn) 
(X2,X1) (X2,X2) +++ (X2,Xn) 

G(x,...,Xn) = det ; ; ; . (16.3) 
(Xn, X1) (Xn, X2) oa, (Xn. Xn) 


Certainly, the function G is well defined for any finite number of vectors of an inner 
product space. 

Next we express the distance d = d(x, M) = ||x — u|| = ||v|| of the point x from 
the subspace M in terms of the coefficients 4 ;: A straightforward calculation gives: 


d* = ||x — ull? = (x —u,x — 4) = (x, x — 4) = (x, x) — (x, u) 
= ||x||? — 2, Aj %))- 
j= 


This identity and the linear system (16.2) is written as one homogeneous linear 
system for the coefficients (Ag = 1,A4,...,An): 


(d? — ||x||7)Ao + LAs. xj) =, 
as (16.4) 
—(xK,X)A0 + YO Aj (XE, Xj) =0, k=1,...,n. 
j=l 
By the projection theorem it is known that this homogeneous linear system has a 
nontrivial solution (Ao, A1,...,An) #4 (0,0,...,0). Hence the determinant of this 
system vanishes, i.e., 


(d? — (x,x)  (x,x1) +++ (x, Xn) 
—(X1,X) (X1,X1) +++ (%1,Xn) 

det | ; : =0. 
—(Xn,X) (XnyX1) +++ (XnyXn) 


Elementary properties of determinants thus give 
a’ G(x1,..-s%n) — G(X,X1,---,%n) = 0. (16.5) 
The Gram determinant of two vectors is: 
G(x1, x2) = [laall? all? — [x1 x2). 


Schwarz’ inequality shows G(x, x2) > 0. Now an induction with respect ton > 2, 
using Eq. (16.5), proves the following theorem which gives in particular an explicit 
way to calculate the distance of a point from a finite dimensional subspace. 


16.3. The Dual of a Hilbert Space 233 


Theorem 16.2 (Gram Determinants) In a Hilbert space H define the Gram 
determinants by Eq. (16.3). Then the following holds: 


1. G(Q,.-.,Xn) = O for all x1,...,Xn € H; 

2. G(X1,.-..Xn) =O > {x1,...,X,} is a linearly independent set; 

3. If X\,...,X, are linearly independent vectors in H, denote by [{x,,...,Xn}] the 
closed linear subspace generated by {x,...,X,}. Then the distance of any point 
x € H from the subspace [{x,...,Xn}] is 


G(x, x1, weg) Xn) 


d = d(x, [{x1,...,Xn}])) = “G(X1,---5Xm) 


(16.6) 


The proof of this result and some generalizations of Schwarz’ inequality given by 
part | of Theorem 16.2 are discussed in the Exercises. 


16.3. The Dual of a Hilbert Space 


Recall that the (topological) dual of a topological vector space V is defined as the 
space of all continuous linear functions T : V — K. In general it is not known 
how to determine the form of the elements of the topological dual explicitly, even in 
the case of Banach spaces. However, in the case of a Hilbert space H. the additional 
information that the norm is induced by an inner product suffices to easily determine 
the explicit form of continuous linear functions T : # — K. Recall that a linear 
function T : # — K is continuous if, and only if, it is bounded, i.e., if, and only 
if, there is some constant Cr such that |T(x)| < Cr|l|x|| for all x © H. Recall 
furthermore that under pointwise addition and scalar multiplication the set of all 
linear functions T : 7 — K is a vector space over the field K (see Exercises). For 
bounded linear functions T : 7 — K one defines 


(| 7 ||’ = sup {|7@)| + x eH, |x|] < 1}. (16.7) 


In the Exercises it is shown that this defines a norm on the space H’ of all bounded 
linear functions on H.. Explicit examples of elements of H’ are all T,,, u € H, defined 
by 


T, (x) = (u, x) Vx EH. (16.8) 


The properties of inner products easily imply that the functions 7,,, u € H, are linear. 
Schwarz’ inequality |(u,x)| < ||u||||x|| shows that these linear functions are bounded 
and thus continuous. And it follows immediately that ||7,||’ < |u|, for all u € H. 
Since 7,(u) = ||u||* one actually has equality in this estimate: 


Tull’ = lull) = =VueH. (16.9) 


234 16 Geometry of Hilbert Spaces 


The following theorem characterizes continuous linear functions on a Hilbert space 
explicitly. This representation theorem says that all elements of H’ are of the form 
T, with some u € H. 


Theorem 16.3 (Riesz—Fréchet) Let H. be a Hilbert space over the field IK. A linear 
function T : H > Kis continuous if, and only if, there isau € H such that T = T,, 
and one has ||T ||' = |\u\l. 


Proof According to the discussion preceding the theorem we have to show that every 
continuous linear functional T on the Hilbert space H is of the form T,, for some 
u € H. If T = Ois the null functional, choose u = 0. If T 4 0, then 


M = kerT = T7'(0) = {x €H: T(x) = 0} 


is a closed linear subspace of which is not equal to H. This is shown in the Exer- 
cises. It follows that the orthogonal complement M+ of M is not trivial (Corollary 
16.1) and the projection theorem states that the Hilbert space has a decomposi- 
tion into two nontrivial closed linear subspaces: H = M ® M+. Hence there is a 
v € M* such that T(v) ¥ 0 and thus, for every x € H we can define the element 
u=uUux)=x—- TO), Linearity of T implies T(u) = 0 and thus u € M, therefore 


T(v 
(v, u(x)) = 0 for all x € H, or 


This proves T = T, for the element 


u= ut) v €(kerT)’. 
(v, v) 
It follows that || T ||’ = ||7,|| = |lull- 

Finally we show uniqueness of the element u € which defines the given con- 
tinuous linear function T by T = T,. Suppose u,v € H define T by this relation. 
Then 7, = T, or (u,x) = (v,x) forall x € H and hence u — v € H+ = {0}, ie., 
u=v. 


Corollary 16.3 A Hilbert space H and its (topological) dual H’ are isometrically 
anti-isomorphic, i.e., there is an isometric map J :H— H' which is antilinear. 


Proof Define amap J: H > H’ by J(u) = T,, for all u € H where T,, is defined 
by Eq. (16.8). Thus J is well defined and we know ||J(w)||’ = ||T,||' = ||u||. Hence 
the map J is isometric. The definition of 7, easily implies 


J(au + Bv) = aJ(u) + BJ) Va,B € RK, Vu,ve H, 


ie., the map J is antilinear. As an isometric map J is injective, and by the 
Riesz—Fréchet Theorem we know that it is surjective. Hence J is an isometric 
antiisomorphism. 


16.3 The Dual of a Hilbert Space 235 


Remark 16.2 


1. The theorem of Riesz and Fréchet relies in a decisive way on the assumption of 
completeness. This theorem does not hold in inner product spaces which are not 
complete. An example is discussed in the Exercises. 

2. The duality property of a Hilbert space H, i.e., H = H’ is used in the bra- and 
ket- vector notation of Dirac. For vectors u € H Dirac writes a ket vector |u > 
and for elements 7, € H’ he writes the bra vector < v|. Bra vectors act on ket 
vectors according to the relation < vju>= 7,(u), u,v € H. In this notation the 
projector Py, onto the subspace spanned by the vector y is Py = |W >< Wl. 


Every continuous linear function T : H — K is of the form T = T, for a unique 
element u in the Hilbert space H, by Theorem 16.3. This implies the following 
orthogonal decomposition of the Hilbert space: H = ker T @ Ku, i.e., the kernel 
or null space of a continuous linear functional on a Hilbert space is a closed linear 
subspace of co-dimension |. This says in particular that a continuous linear functional 
“lives” on the one dimensional subspace Ku. This is actually the case in the general 
setting of locally convex topological vector spaces as the Exercises show. 

The Theorem of Riesz and Fréchet has many other applications. We discuss here 
an easy solution of the extension problem, i.e., the problem of finding a continuous 
linear functional T on the Hilbert space H which agrees with a given continuous 
linear functional Ty on a linear subspace M of H and which has the same norm as 
To. 


Theorem 16.4 (Extension Theorem) Let M be a linear subspace of a Hilbert space 
H. and Ty : M — Ka continuous linear functional, i.e., there is some constant C 
such that |To(x)| < C||x|| for all x € M. Then there is exactly one continuous linear 
functional T : H — K such that T|M = Tp and ||T ||! = ||To|\' where the definition 


|| Toll’ = sup {|To(x)| : x € M, [xl] < Yj 


is used. 


Proof The closure M of the linear subspace M is itself a Hilbert space, when 
we use the restriction of the inner product (-,-) of # to M. This is shown as an 
exercise. We show next that 7p has a unique extension 7; to a continuous linear 
function M — K. Given x € M there is a sequence (x,),<cn in M which converges 
to x. Define 7\(x) = lim 7o(x). This limit exists since the field K is complete and 
(To(Xn))nen is a Cauchy sequence in K: We have the estimate |7To(x,) — To(%m)| = 
|To(Xn — Xm)| < Cl|Xn — Xm ||, and we know that (x; )nen is a Cauchy sequence in the 
Hilbert space M. If we take another sequence (y,)nen in M with limit x we know 
To(%n) — To¥n)| = [Ton — Yn)! < CllX%n — yal] > 0 as n — o and thus both 
sequences give the same limit 7; (x). It follows that 


Til = sup {|71@)| : x € M, |]x||< 1} = sup {|Zo(~)| : x € M, ||x||< 1 = Tll'} < C. 


The second identity is shown in the Exercises. 


236 16 Geometry of Hilbert Spaces 


The Theorem of Riesz—Fréchet implies: There is exactly one v € M such that 
T\(u) = (v,u) for all u € M and ||7;||' = |||]. Since the inner product is actually 
defined on all of H, we get an easy extension T of 7, to the Hilbert space H by 
defining T(x) = (v, x) for all x € H and it follows that ||T ||’ = |lv|| = || Toll’: 

This functional T is an extension of 7p since for all u € M one has T(u) = 
(v,u) = T,(u) = To(u), by definition of 7). 

Suppose that S is a continuous linear extension of 7y. As a continuous linear 
functional on H this extension is of the form S(x) = (y,x), for all x © H, witha 
unique y € H. And, since S is an extension of 7p, we know S(u) = To(u) = (v, u) 
for all u € M and thus for all u € M. This shows (y — v,u) = 0 for allu € M, 
hence y — v € M+, and we deduce ||$||' = |y|] = VIlvll2 + lly — v||2. Hence this 
extension S satisfies ||5||’ = ||To||’ = ||v|| if, and only if, y — v = 0, i.e., if, and only 
if, S = T = T,, and we conclude. 

Methods and results from the theory of Hilbert spaces and their operators are used 
in various areas of mathematics. We present here an application of Theorem 16.4 to 
a problem from distribution theory, namely to prove the existence of a fundamental 
solution for a special constant coefficient partial differential operator. Earlier we 
had used Fourier transformation for distributions to find a fundamental solution for 
this type of differential operator. The proof of the important Theorem 8.2 follows a 
similar strategy. 


Corollary 16.4 The linear partial differential operator with constant coefficients 
1-—A, in R’ 


has a fundamental solution in S’(R"). 


Proof Consider the subspace M = (1—A,,)D(R") of the Hilbert space H = L*(R”) 
and define a linear functional Ty : M — K by 


To(1 — An)o) = G0) Vb Ee DR"). 
Applying Lemma 10.1 to the inverse Fourier transformation one has the estimate 


|) < IIPlloo < Qa) 2|F Olli. 


If 2m > n, then the function p + (1 + p”)~” belongs to the Hilbert space L?(R"), 
and we can use Schwarz’ inequality to estimate the norm of ¢ = F¢ as follows: 


Illi = +p?) - At py" oly < WW +p? a1. + p?y" lle. 


By theorems 10.7 and 10.2 we know ||(1 + p?)"@ll2 = ||. — An)" lz and thus the 
estimate 
Ip(0)| < CIA — An)" ln Vo € DR") 


follows, with a constant C which is given by the above calculations. This estimate 
shows first that the functional 7p is well defined on M. It is easy to see that Tp is 


16.4 Exercises 237 


linear. Now the above estimate also implies that 7p is continuous. Hence the above 
extension theorem can be applied, and thus there is u € L7(IR") such that 


Tol — Any"@) = (u, (1 — An)"@)2 = [ u(x)(1 — An)" P(x) dx 


for all 6 € DCR"). By definition of Tp this shows that 


i u(x)(1 — An)" @(x) dx = 6(0), 


ie., the distribution E = (1 — A,,)”~!u is a fundamental solution of the operator 
1— A,. Since u € L?(R") the distribution E = (1 — A,)"~!w is tempered, and we 


conclude. 


16.4 Exercises 


BRWN FR 


Nn 


. Prove Lemma 16.1. 

. Prove the first two parts of Lemma 16.2. 

. Find an example supporting the first part of Remark 16.1. 

. Consider the Euclidean space R?, but equipped with the norm ||x || = |x1| + |x| 


for x = (x1,x2) € R?. Show that this is a Banach space but not a Hilbert space. 
Consider the point x = (— 5, 5) for some r > 0 and the closed linear subspace 
M= {x eER?: x, = xy}. Prove that this point has the distance r from the 
subspace M, i.e., inf {||x — u|| : u € M} =r and that there are infinitely many 
points u € M such that ||x — u|| = r. Conclude that the projection theorem does 
not hold for Banach spaces which are not Hilbert spaces (compare with part b) 
of Remark 16.1). 


. Prove Theorem 16.2. 
. For three vectors x, y,z in a Hilbert space H, calculate the Gram determinant 


G(x, y, z) explicitly and discuss in detail the inequality 0 < G(x, y, z). Consider 
some special cases: x L y, x Lz, or yLz. 


. For a nontrivial continuous linear function T :  — K ona Hilbert space H 


show that its null-space ker T is a proper closed linear subspace of .. 


. Consider the space of all terminating sequences of elements in K: 


2(K) = {x = (x1,%2,... ,xn,0,0,...): xj € K, N= N(x) EN}. 


Obviously one has (IK) C €?(K) and it is naturally a vector space over the field 
KK; as an inner product on ¢2(K) we take (x, y)2 = ae x;y;- Consider the 


1 1 1 


sequence u = (1, Di Gse Mee ae .) € £?(R) and use it to define a linear function 


10. 


11. 


16 Geometry of Hilbert Spaces 


T=T,: @2(K) > K by 


[oe] 


T(x) = tu, = —Xp. 
(x) = (u,x)2 d —x 
This function is continuous by Schwarz’ inequality: |T(x)| < ||u|l2||x|l2 for all 
xe €?(K). Conclude that the theorem of Riesz—Fréchet does not hold for the 
inner product space ¢2(K). 


. For a linear subspace M of a Hilbert space with inner product (-,-) show: 


The closure M of M is a Hilbert space when equipped with the restriction of the 
inner product (-,-) to M. 
Prove the identity 


sup {|7i(x)| : x € M, lla lls 1} = sup {|To(x)| : x € M, IIx] < 


used in the proof of Theorem 16.4. 

Give an example of a linear functional which is not continuous. 

Hints: Consider the real vector space V of all real polynomials P on the interval 
I = [0,1], take a point a ¢ J, for instance a = 2, and define T, : V — R by 
T,(P) = P(a) for all P € V. Show that 7, is not continuous with respect to the 
norm || P || = sup,<,; |P(x)| on V. 


Chapter 17 
Separable Hilbert Spaces 


Up to now we have studied results which are available in any Hilbert space. Now 
we turn our attention to a very important subclass which one encounters in many 
applications, in mathematics as well as in physics. This subclass is characterized by 
the property that the Hilbert space has a countable basis defined in a way suitable 
for Hilbert spaces. Such a “Hilbert space basis” plays the same role as a coordinate 
system in a finite dimensional vector space. 

Recall that two finite dimensional vector spaces are isomorphic if, and only if, they 
have the same dimension. Similarly, Hilbert spaces are characterized up to isomorphy 
by the cardinality of their Hilbert space basis. Those Hilbert spaces which have a 
countable Hilbert space basis are called separable. 

In a first section we introduce and discuss the basic concepts and results in the 
theory of separable Hilbert spaces. Then a special class of separable Hilbert spaces 
is investigated. For this subclass the Hilbert space basis is defined in an explicit way 
through a given weight function and an orthogonalization procedure. These spaces 
play an important role in the study of differential operators, in particular in quantum 
mechanics. 


17.1 Basic Facts 


As indicated above, the concept of a Hilbert space basis differs from the concept of 
a basis in a vector space. The point which distinguishes these two concepts is that 
for the definition of a Hilbert space basis a limit process is used. 

We begin by recalling the concept of a basis in a vector space V over the field K. 
A nonempty subset A C V is called linearly independent if, and only if, every finite 
subset {x1,...,X,} C A,n EN, is linearly independent. A finite subset {x),..., x,}, 
x; # x; fori # j is called linearly independent if, and only if, )*"_, Aix; = 0, 
A; € K, implies A; = A. = --- = A, = 0, Le., the only way to write the null 
vector 0 of V as a linear combination of the vectors x;,...,X, is the trivial one with 
4; =0e€K fori =1,...,n. 


© Springer International Publishing Switzerland 2015 239 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_17 


240 17 Separable Hilbert Spaces 


The set of all vectors in V, which can be written as some linear combination of 
elements in the given nonempty subset A is called the linear hull lin A of A (see 
Definition 16.2), i.e., 


linA=jxeEV: 20 oe € A, A; EK n ew 


i=1 


It is the smallest linear subspace which contains A. A linearly independent subset 
A C V which generates V, i.e., lin A = V, is called a basis of the vector space V. 

A linearly independent set A C V is called maximal if, and only if, for any linearly 
independent subset A’ the relation A C A’ implies A = A’. In this sense a basis is a 
maximal linearly independent subset. This means: If one adds an element x of V to 
a basis B, then the resulting subset B U {x} is no longer linearly independent. With 
the help of Zorn’s Lemma (or the axiom of choice) one can prove that every vector 
space has a basis. Such a basis is a purely algebraic concept and is often called a 
Hamel basis. 

In 1927, J. Schauder introduced the concept of Hilbert space basis or a basis of a 
Hilbert space, which takes the topological structure of a Hilbert space into account 
as expressed in the following definition: 


Definition 17.1 Let H be a Hilbert space over the field K and B a subset of H. 


1. B is called a Hilbert space basis of 1 if, and only if, B is linearly independent 
in the vector space H and B generates H in the sense that [B] = lin B = H. 

2. The Hilbert space H is called separable if, and only if, it has a countable Hilbert 
space basis B = {x, € H: n € N} (ora finite basis B = {x),..., xy} for some 
N &€N). 

3. An orthonormal system B = {xy € H: a € A} in H, which is a Hilbert space 
basis is called an orthonormal basis or ONB of H. 


It is important to realize that in general a Hilbert space basis is not an algebraic basis! 
For instance in the case of a separable Hilbert space a general element in H is known 
to have a representation as a series aher AnXn in the elements x, of the basis but not 
as a linear combination. 

Often a separable Hilbert space is defined as a Hilbert space, which has a countable 
dense subset. Sometimes this definition is more convenient. The equivalence of both 
definitions is shown in the Exercises. 

In the original definition of a Hilbert space the condition of separability was 
included. However in 1934 F. Rellich and F. Riesz pointed out that for most parts of 
the theory the separability assumption is not needed. 

Nevertheless, most Hilbert spaces that one encounters in applications are separa- 
ble. In the Exercises we discuss an example of a Hilbert space which is not separable. 
This is the space of almost-periodic functions on the real line R. 

As we know from the Euclidean spaces R” it is in general a great advantage in 
many problems to work with an orthonormal basis {e,,.. ., é,, } instead of an arbitrary 
basis. Here e, is the standard unit vector along coordinate axis i. In a separable infinite 


17.1 Basic Facts 241 


dimensional Hilbert space the corresponding basis is an orthonormal Hilbert space 
basis, or ONB. The proof of the following result describes in detail how to construct 
an ONB given any Hilbert space basis. Only the case of a separable Hilbert space 
is considered since this is the case which is needed in most applications. Using the 
axiom of choice one can also prove the existence of an orthonormal basis in the 
case of a nonseparable Hilbert space. In the second section of this chapter we use 
this construction to generate explicitly ONB’s for concrete Hilbert spaces of square 
integrable functions. 


Theorem 17.1 (Gram—Schmidt Orthonormalization) Every separable Hilbert 
space H has an orthonormal basis B. 


Proof By definition of a separable Hilbert space there is a countable Hilbert space 
basis B = {y,:n €N} C H (or a finite basis; we consider explicitly the first 
case). Define z} = y; since B is a basis we know ||y,|| > 0 and hence the vector 
2M=y- apne is well-defined in H. One has z; 1 Zz since (z;,Z2) = 0. As 
elements of the basis B the vectors y; and y2 are linearly independent, therefore the 
vector Zz is not the null vector, and certainly the set of vectors {z,, Z2} generates the 
same linear subspace as the set of vectors {y1, ya}: [{z1, Z2}] = [{y1, yo}. 

We proceed by induction and assume that for some N € N, N > 2 the set of 


vectors {Z1,...,Zwv} is well-defined and has the following properties: 


a) ||zjl| > O forall j =1,...,N 

b) (z,z;) =Oforalli,j¢{1,..,.N)iAJj 

c) The set {z1,...,z} generates the same linear subspace as the set {y1,..., yy}, 
Le., [{z1,-...Zv}] = [{y1,.-. yw}. 


This allows us to define 


N 
(Zi's Yn+1) 
ZN+1 = YN+1 — > A eg NS 
aay Meese) 


The orthogonality condition (b) easily implies (z;,zv+1) = O for j = 1,...,N. 
Hence, the set of vectors {z,,...,Zv,Zv+1} iS pairwise orthogonal too. From the 
definition of the vector zy+ it is clear that [{z),...,Zv+i}] = [Oy1.-- + yw+i}] holds. 
Finally, since the vector yy+, is not a linear combination of the vectors y,,..., yy 
the vector zy+, is not zero. This shows that the set of vectors {z,,...,Z,Zn+1} too 
has the properties (a), (b), and (c). By the principle of induction we conclude: There 
is a set of vectors {z, € H : k € N} such that (z;, z,) = O forall j,k ¢ N, j Ak and 
[{z 2k © N}] = [Oy : & € N}] = H. Finally we normalize the vectors z, to obtain 
an orthonormal basis B = {e, :k € N}, e, = eee 


Theorem 17.2 (Characterization of ONB’s) Let B = {x,:n €N} be an or- 
thonormal system in a separable Hilbert space H. The following statements are 
equivalent. 


a) B is maximal (or complete), i.e., an ONB. 
b) For any x € H the condition “(x,,x) = 0 for alln € N” implies x = 0. 


242 17 Separable Hilbert Spaces 


c) Every x €H has the Fourier expansion 


(oe) 


Fo Seat 


n=1 


d) For all vectors x, y € H the completeness relation 


CO 
ay) = GG) 
n=1 
holds. 
e) For every x € H the Parseval relation 
CO 
lair SD. esa 
n=1 


holds. 


Proof (a) = (b): Suppose that there is az € H, z € 0, with the property (x,,z) = 0 
for alln € N. Then B’ = {7p 1 X25 ...} is an orthonormal system in H. in which 
B is properly contained, contradicting the maximality of B. Hence there is no such 
vector z € H and statement (b) follows. 

(b) = (c): Given x € H introduce the sequence x) = Y~”_, (xp, x) Xn. Bessel’s 
inequality (Corollary 15.1) shows that |Jx ||? = 0%, |(xn,x)|? < |[x|/? for all 
N €N. Hence, the infinite series )°°° ; | (Xn, x) |* converges and its value is less than 
or equal to ||x||?. For all M < N we have ||x® — x? = ied [Raper 
and the convergence of the series )>°° ; |(xn,x)|? implies that (x) yen is a Cauchy 
sequence. Hence this sequence converges to a unique point y € H, 


CO 
eH (N) _ 
y= Jim x — 2 nn 

Since, the inner product is continuous we deduce that (x,, y) = limy—oo (xy), x“) = 
(Xn, x) for all n € N. Therefore (x,,x — y) = 0 for all n € N and hypothesis (b) 
implies x — y = 0, hence statement (c) follows. 

(c) = (d): According to statement (c) any vector x € H has a Fourier expansion, 
x = OP (Xn. xX) Xn, Similarly for y € H: y = °°, (Xn, y)xn. Continuity of the 
inner product and orthonormality of {x, : n € N} imply the completeness relation: 


(oe) 


9) =) Gt) ie) =) ae) es 


n=1 n=1 


(d) > (e): Obviously, statement (e) is just the special case x = y of statement (d). 


17.1 Basic Facts 243 


(e) = (a): Suppose that the system B is not maximal. Then we can add one unit 
vector z € H to it which is orthogonal to B. Now Parseval’s relation (e) gives the 


contradiction 
CO oe) 
2 2 
1 = [IzI? =) 7 Ita,2)? = 0 =0. 
n=1 n=1 


Therefore, when Parseval’s relation holds for every x € H, the system B is maximal. 


As a first application of the characterization of an orthonormal basis we deter- 
mine explicitly the closed linear hull of an orthonormal system (ONS). As a simple 
consequence one obtains a characterization of separable Hilbert spaces. 


Corollary 17.1 Let {x, :n <€N} be an orthonormal system in a Hilbert space 
over the field IK. Denote the closed linear hull of this system by M, i.e, M = 
[{x, :n € N}]. Then, the following holds: 


i, Mate! Hix = yy tite € = (Ghee @7(IK)}. 
2. The mapping U : @(IK)—>M, defined by Uc = x, = yet Eon 
isomorphism and one has 


(Uc, Uc!)y = (c,c em) Vee € @(R). 


Proof For c = (¢a)nen € €2(K) define a sequence x) = YN eqx, € 
lin {x, :n € N}in the Hilbert space 1. Since {x, : n € N} is an orthonormal system 
one has for all N, M € N, M <N, |x“ — x)? = 4 |cn|?. It follows that 
(x) ven is a Cauchy sequence in the Hilbert space H and thus it converges to 


[o.e) 
— i; (N) — 
Xe= lim x = Dont EH. 
Obviously, x, belongs to the closure M of the linear hull of the given orthonormal 
system. Hence (IK) > cb Xx, defines a map U from €7(IK) into M. This map 
is linear as one easily shows. Under the restriction of the inner product of H the 
closed linear subspace M is itself a Hilbert space which has, by definition, the given 
ONS as a Hilbert space basis. Therefore, by Theorem 17.2, for every x € M one 
has x = oy (Xn, x)aXn and |x 1]j, = D2) | (xn. x)24|?. Hence every x € M is 
the image of the sequence c = ((Xn,X)4)nen € €7(IK) under the map U, ie., U is 
a linear map from €7(K) onto M, and the inverse map of U is the map M 3 x 
U-'x = ((xn,x)q)nen € €7(R). 
For c = (Cn nen € €7(K) we calculate (x,,Uc)4, = cp for all n € N and thus by 
the completeness relation of Theorem 17.2 


(Uc, Uc) 4 Se = (c,c’) any Ve,c’ € (RB). 

n=1 
In particular one has ||Uc||z = |lc|le2q@ for all c € (IK). Thus U is a bijective 
continuous linear map with continuous inverse, which does not change the values of 
the inner products, i.e., an isomorphism of Hilbert spaces. 


244 17 Separable Hilbert Spaces 


Corollary 17.2. Every infinite dimensional separable Hilbert space H over the field 
K is isomorphic to the sequence space €?(R). 


Proof Yf{x, :n € N}isanorthonormal basis we know that the closed linear subspace 
M generated by this basis is equal to the Hilbert space H. Hence, by the previous 
corollary we conclude. 

Later we will learn that, for instance, the Lebesgue space L7(IR”, dx) is a separable 
Hilbert space. According to Corollary 17.2 this Lebesque space is isomorphic to the 
sequence space £7(IK). Why then is it important to study other separable Hilbert 
spaces than the sequence space ¢7(IK)? These other separable Hilbert spaces have, 
just as the Lebesgue space, an additional structure which is lost if they are realized as 
sequence spaces. While linear partial differential operators, for instance Schrédinger 
operators, can be studied conveniently in the Lebesgue space, this is in general not 
the case in the sequence space. In the second section of this chapter we will construct 
explicitly an orthonormal basis for Hilbert spaces L7(/) of square integrable functions 
over some interval J. It turns out that the elements of the ONB’s constructed there, 
are “eigenfunctions” of important differential operators. 

The results on the characterization of an orthonormal basis are quite powerful. We 
illustrate this with the example of the theory of Fourier expansions in the Hilbert 
space L?({0, 2], dx). 

We begin by recalling some classical results. For integrable functions on the 
interval [0, 277] the integrals 


1 Qn : 
Ch = Cal )= — | ee (x) dx = en, 
£ Vee he (en, f)2 
are well-defined. In the Exercises one shows that the system of functions e,, € Z, 
€n(X) = ve is an orthonormal system in the Hilbert space L?({0, 2], dx). With the 
above numbers c, one forms the Fourier series 


+00 


> en( en 


n=—C 


of the function f. A classical result from the theory of Fourier series reads (see [1]): 
If f is continuously differentiable on the interval [0, 27r], then the Fourier series 
converges uniformly to f, i.e., the sequence of partial sums of the Fourier series 
converges uniformly to f. This implies in particular 


N 
li - = 
Jim If — DE en Aenlla = 0 
n=—N 
for all f € C'({0,z)). 
We claim that the system {e, :n € Z} is actually an orthonormal basis of 
L?({0, 27], dx). For the proof take any g € L({0,2z], dx) with the property 


17.2 Weight Functions and Orthogonal Polynomials 245 


(€n,Z)2 = O for all n € Z. From the above convergence result we deduce, for 
all f € C'({0,2z]), 


N 


(f.g)2 = lim ( Y 7 en( fen. 8)2 = 0. 


n=—N 


Since C!({0,27r]) is known to be dense in L7([0, 27], dx) it follows that g = 0, 
by Corollary 17.2, hence by Theorem 17.2, this system is an orthonormal basis 
of L*((0, 27], dx). Therefore, every f € L?({0, 27], dx) has a Fourier expansion, 
which converges (in the sense of the L?-topology). Thus, convergence of the Fourier 
series in the L?-topology is “natural,” from the point of view of having convergence 
of this series for the largest class of functions. 


17.2 Weight Functions and Orthogonal Polynomials 


Not only for the interval J = [0,27] are the Hilbert spaces L7(1, dx) separable, 
but for any interval J = [a,b], -oo < a < b < +c, as the results of this section 
will show. Furthermore an orthonormal basis will be constructed explicitly and some 
interesting properties of the elements of such a basis will be investigated. 

The starting point is a weight function p : I Ron the interval I whichis assumed 
to have the following properties: 


1. On the interval /, the function p is strictly positive: p(x) > 0 for all x € I. 
2. If the interval J is not bounded, there are two positive constants a and C such 
that p(x)e*"! < C forall x € I. 


The strategy to prove that the Hilbert space L?(/, dx) is separable is quite simple. A 
first step shows that the countable set of functions p,(x) = x" p(x), n = 0,1,2,... 
is total in this Hilbert space. The Gram—Schmidt orthonormalization then produces 
easily an orthonormal basis. 


Lemma 17.1 The system of functions {p, :n = 0,1,2,...} is total in the Hilbert 
space L?(1, dx), for any interval I. 


Proof For the proof we have to show: If anelementh € L?(J, dx) satisfies (p,,h)2 = 
0 for all n, then h = 0. 

In the case J # R we consider h to be be extended by 0 to R\J and thus get a 
function h € L?(R, dx). On the strip S, = {p =u+ ive C:u,veER, |v| <a}, 
introduce the auxiliary function 


F(p) = / plxyh(x) el dr. 
R 


The growth restriction on the weight function implies that F is a well-defined holo- 
morphic function on S, (see Exercises). Differentiation of F generates the functions 


246 17 Separable Hilbert Spaces 


Pn in this integral: 
d’F 
dp” 


forn = 0,1,2,..., and we deduce F™(0) = i”(p,,h). = 0 for all n. Since F is 
holomorphic in the strip S, it follows that F(p) = 0 for all p € Sy (see Theorem 
9.5) and thus in particular F(p) = 0 for all p € R. But F(p) = J/22L(ph)(p) 
where CL is the inverse Fourier transform (see Theorem 10.1), and we know 
(Lf,£Le)2 = (f.g)2 for all f,g € L?(R, dx) (Theorem 10.7). It follows that 
(ph, ph), = (L(ph), L(ph))2 = 0 and thus ph = 0 € L?(R, dx). Since p(x) > 0 
for x € J this implies / = 0 and we conclude. 

Technically it is simpler to do the orthonormalization of the system of functions 
{P, :n € N} not in the Hilbert space L*(1, dx) directly but in the Hilbert space 
L?(I, pdx), which is defined as the space of all equivalence classes of measurable 
functions f : —K such that /, rif (x)|?,0(x) dx < oo equipped with the inner prod- 
uct (f,8)p = J f(x)g(x)p(x) dx. Note that the relation (f, 2)) = (/of,./pg)2 
holds for all f,g € L*(/, dx). It implies that the Hilbert spaces L?(/, pdx) and 
L?(1, dx) are (isometrically) isomorphic under the map 


L°(1, pdx) 3 fe Sof € L7(/, dx). 


This is shown in the Exercises. Using this isomorphism, Lemma 17.1 can be restated 
as saying that the system of powers of x, {x” :n =0,1,2,..} is total in the Hilbert 
space L7(1, pdx). 

We proceed by applying the Gram—Schmidt orthonormalization to the system of 
powers {x” :n = 0, 1,2,...} in the Hilbert space L7(/, pdx). This gives a sequence 
of polynomials P, of degree k such that (P;,, Pm)» = Skm- These polynomials are 
defined recursively in the following way: Qo(x) = x° = 1, and when for k > 1 the 
polynomials Qo,..., Qx—1 are defined, we define the polynomial Q; by 


k-1 
(ear 
aay eo. 
ian Xu 10,,0,),2 


Finally, the polynomials Q; are normalized and we arrive at an orthonormal system 
of polynomials P,: 


F(p) = 


(p) = i" / h(x) p(x)x"el?* dx 
R 


Pi Or,  k=0,1,2,.... 


_ ol 

I Qellp 
Note that according to this construction, P; is a polynomial of degree k with positive 
coefficient for the power x*. Theorem 17.1 and Lemma 17.1 imply that the system 
of polynomials {P, :k = 0,1,2,...} is an orthonormal basis of the Hilbert space 
L?(1, pdx). If we now introduce the functions 


ex(x) = Pr(x)Vo(x), xel 


we obtain an orthonormal basis of the Hilbert space L?(1, dx). This shows Theorem 
17.3. 


17.2 Weight Functions and Orthogonal Polynomials 247 


Theorem 17.3. For any interval I = (a,b), —oo <a < b < +00 the Hilbert space 
L*(1, dx) is separable, and the above system {e, : k = 0,1,2,...}is an orthonormal 
basis. 


Proof Only the existence of a weight function for the interval J has to be shown. 
Then by the preceding discussion we conclude. A simple choice of a weight function 
for any of these intervals is for instance the exponential function p(x) = ener 
x ER, for somea > 0. 

Naturally, the orthonormal polynomials P; depend on the interval and the weight 
function. After some general properties of these polynomials have been studied we 
will determine the orthonormal polynomials for some intervals and weight functions 
explicitly. 


Lemma 17.2 [f Q,, is a polynomial of degree m, then (Qm, Px)» = Oforallk > m. 
Proof Since {Py :k =0,1,2,...} is an ONB of the Hilbert space L?(I, pdx) the 


polynomial Q,,, has a Fourier expaision with respect to this ONB: Q,, = phar Cn Pr, 
Cn = (Pn, Qm)p. Since the powers x*, k = 0,1,2,... are linearly independent 


functions on the interval J and since the degree of Q,, is m and that of P,, is n, the 
coefficients c, in this expansion must vanish for n > m, i.e., Om = Y~)"-9 Cn Pn and 
thus (Px, Om)» = 0 for all k > m. 

Since, the orthonormal system {Py : k = 0,1,2,...} is obtained by the Gram— 
Schmidt orthonormalization from the system of powers x* fork =0,1,2,... with 
respect to the inner product (-,-),, the polynomial P,+1 is generated by multiplying 
the polynomial P,, with x and adding some lower order polynomial as correction. 
Indeed one has 


Proposition 17.1 Let p be a weight for the interval I = (a,b) and denote the 
complete system of orthonormal polynomials for this weight and this interval by 
{P, :k =0,1,2,...}. Then, for every n > 1, there are constants An, By, Cy such 
that 

Py4i(%) = (Anx + Bn) Pa(x) + Ch Pai) Vxel. 


Proof We know P,(x) = ax* + Ox—1(x) with some constant a, > 0 and some 
polynomial Q;_, of degree smaller than or equal to k — 1. Thus, if we define A, = 
att , it follows that P,,; — A,xP, is a polynomial of degree smaller than or equal 
to, n, hence there are constants c,, such that 


n 


Pasi A,x Py, = ) Cn Pk. 
k=0 


Now calculate the inner product with P;, j <n: 


(P; — Aas Pap = Doone P is Px) 9 = Cn,j- 


248 17 Separable Hilbert Spaces 


Since the polynomial P; is orthogonal to all polynomials Q; of degree j < k — 1 
we deduce that c,,; = 0 for all j < n — 1, Cyp—1 = —An(XPh—1, Pa), and Chy = 
—An (x Py, Pn). The statement follows by choosing By = Cy, and Cy = Cyn—1- 


Proposition 17.2 For any weight function p on the interval I, the kth orthonormal 
polynomial P; has exactly k simple real zeroes. 


Proof Per construction the orthonormal polynomials P; have real coefficients, have 
the degree k, and the coefficient cx, is positive. The fundamental theorem of algebra 
(Theorem 9.4) implies: The polynomial P;, has a certain number m < k of simple 
real roots x1,...,X and the roots which are not real occur in pairs of complex 
conjugate numbers, (z;,Z;), j =m+1,...,M with the same multiplicity nj, m+ 
2 ee, 4,2; =k. Therefore the polynomial P;, can be written as 


m M 
A@=al[@-x) [] @-e-m. 
j=l j=m+1 


Consider the polynomial Q,,(x) = cx Te (x — x;). It has the degree m and ex- 


actly m real simple roots. Since Py(x) = Q(x) Hee 4 Ik — 2; \?"/, it follows that 
Py(x)QOm(x) = 0 for all x € J and PkQ», A 0, hence (Py, Om), > 0. If the degree 
m of the polynomial Q,, would be smaller than k, we would arrive at a contradiction 
to the result of the previous lemma, hence m = k and the pairs of complex conjugate 
roots cannot occur. Thus we conclude. 

In the Exercises, with the same argument, we prove the following extension of 
this proposition. 


Lemma 17.3 The polynomial Q,(x,d) = Py(x)+A Pr_1(x) has k simple real roots, 
foranyXeR. 


Lemma 17.4 There are no points x9 € I and no integer k > O such that P,(x9) = 
Px-1(X0) = 0. 


Proof Suppose that for some k > 0 the orthonormal polynomials P;, and Py_; havea 
common root xo € I: Py(xo) = Pr—1(xo) = O. Since we know that these orthonormal 
polynomials have simple real roots, we know in particular P;_,(xo) # 0 and thus 


we can take the real number Ap = po to form the polynomial Q;(x,Ao9) = 
k-1 


Py(x) + Ao Pr_-1(x). It follows that O(x9, Ao) = 0 and Q;(xo) = 0, i.e., xo is a root 
of Q;(-,A) with multiplicity at least two. But this contradicts the previous lemma. 
Hence there is no common root of the polynomials P, and Px_1. 


Theorem 17.4 (Knotensatz) Let {Py : k = 0,1,2,...} be the orthonormal basis for 
some interval I and some weight function p. Then the roots of Pg—, separate the 
roots of P,, i.e., between two successive roots of P; there is exactly one root of Py_. 


Proof Suppose that a < are two successive roots of the polynomial Px so that 
P,(x) 4 0 for all x € (a, 6). Assume furthermore that P,_; has no root in the open 
interval (a, 6). The previous lemma implies that P,_; does not vanish in the closed 


17.3. Examples of Complete Orthonormal Systems for L?(/, pdx) 249 


interval [a, 6]. Since the polynomials P;,_; and — P;_; have the same system of roots, 
we can assume that P,_; is positive in [a, 6] and P; is negative in (a, 8). Define the 
function f(x) = a It is continuous on [a, 8] and satisfies f(a) = f(B) = 0 
and f(x) > 0 for all x € (a, B). It follows that Ay = sup { f(x) : x € [a, B]} = f(x) 
for some xo € (a, 8). Now consider the family of polynomials Q;(x,A) = Pr(x) + 
NPp_-1(x) = Pp_i(x)(A — f(x)). Therefore, for all 4 > Ao, the polynomials Q;(-, A) 
are nonnegative on [a, 6], in particular Q;,(x,A9) => O for all x € [a, B]. Since 
do = f (Xo), it follows that Q;(x0,A0) = O, thus Q;(-,Ao) has a root xp € (a, B). 
Since f has a maximum at xo, we know 0 = f’(xo). The derivative of f is easily 
calculated: 


Pu(x) Pex) — P(x) Py 1) 
Py_4(x)? 


Thus f’(xo) = O implies Pi(x0) Pe-1(xo) — Pxr(xo)Pi_,(xo) = 0, and therefore 
O1.(x0) = Pi(%o) + f (xo) Pi_1 (xo) = 0. Hence the polynomial Q;(-, Ag) has a root of 
multiplicity 2 at x9. This contradicts Lemma 17.3 and therefore the polynomial P,_| 
has at least one root in the interval (@, 8). Since Py_; has exactly k — 1 simple real 
roots according to Proposition 17.2, we conclude that P;,_; has exactly one simple 
root in (a, 6) which proves the theorem. 


f@) = 


Remark 17.1 Consider the function 


F(O)= - Q(x) p(x)dx, — Q(x) = Do aga*. 


k=0 


Since we can expand Q in terms of the orthonormal basis {P, :k = 0,1,2,...}, 
Q= ee crPr, Ck = (Pe,Q)>, the value of Aine function F can be expressed in 
terms of the coefficients c, as F(Q) = ‘— -0 G and it follows that the orthonormal 
polynomials P, minimize the function Q +» F(Q) under obvious constraints (see 


Exercises). 


17.3. Examples of Complete Orthonormal Systems for 
LI, pdx) 


For the intervals 7 = R, J = Rt = [0,00), and J = [—1,1] we are going to 
construct explicitly an orthonormal basis by choosing a suitable weight function and 
applying the construction explained above. Certainly, the above general results apply 
to these concrete examples, in particular the “Knotensatz.” 


17.3.1 T=R, p(x) = e-*: Hermite Polynomials 


—x? 


Evidently, the function p(x) = e~* is a weight function for the real line. Therefore, 


x2 
by Lemma 17.1, the system of functions p,(x) = x”e~ 7 generates the Hilbert space 


250 17 Separable Hilbert Spaces 


L?(R, dx). Finally the Gram—Schmidt orthonormalization produces an orthonormal 
basis {h, : n = 0,1,2,...}. The elements of this basis have the form (Rodrigues’ 
formula) 


hint) =] (1 ee (+ : (e) = c,H,(x)e~ = (17.1) 
n n dx n n . 


with normalization constants 
Cr = (2"nl/a)y 2 in =0,1,2,.... 


Here the functions H,, are polynomials of degree n, called Hermite polynomials and 
the functions h,, are the Hermite functions of order n. 


Theorem 17.5 The system of Hermite functions {h, :n=0,1,2,...} is an or- 
thonormal basis of the Hilbert space L?(R, dx). The statements of Theorem 17.4 
apply to the Hermite polynomials. 

Using Eq. (17.1) one deduces in the Exercises that the Hermite polynomials satisfy 
the recursion relation 


An+1(x) — 2x A, (x) + 2n Hy—1(x) = 0 (17.2) 
and the differential equation (y = H,,(x)) 
y” —2xy' + 2ny = 0. (17.3) 


These relations show that the Hermite functions h, are the eigenfunctions of the 
quantum harmonic oscillator with the Hamiltonian H = 5(P? + Q7) for the eigen- 
value n + 5, Ah, =(n+ 5)Ans n=0,1,2..... For more details we refer to [2-4]. 
In these references one also finds other methods to prove that the Hermite functions 
form an orthonormal basis. 

Note also that the Hermite functions belong to the Schwartz test function space 
S(R). 


17.3.2 T=R*, p(x) = e*: Laguerre Polynomials 


On the positive real line the exponential function p(x) = e~ certainly is a weight 
function. Hence our general results apply here and we obtain 


Theorem 17.6 The system of Laguerre functions {£, :n =0,1,2,...} which 
is constructed by orthonormalization of the system {x"e~2} : n=0,1,2,... in 
L?(IR*, dx) is an orthonormal basis. These Laguerre functions have the following 
form (Rodrigues’ formula): 


1 x ‘s, d " 
by(x) = —Lg(ae3, L(x) = e' (=) (x"e*, n=0,1,2,.... (174) 


For the system {L,:n=0,1,2,...} of Laguerre polynomials Theorem 17.4 
applies. 


17.3. Examples of Complete Orthonormal Systems for L?(/, pdx) 251 


In the Exercises we show that the Laguerre polynomials of different order are 
related according to the identity 


(0 + WILn41) + & — 2n — ILn(X) + nLy-1(x) = 9, (17.5) 
and are solutions of the second order differential equation (y = L,,(x)) 
xy” +(1—x)y’ +ny =0. (17.6) 


In quantum mechanics this differential equation is related to the radial Schrédinger 
equation for the hydrogen atom. 


17.3.3 T=[-1,+1], e(x)=1: Legendre Polynomials 


For any finite interval J = [a,b], —oo < a < b < o one can take any posi- 
tive constant as a weight function. Thus, Lemma 17.1 says that the system of powers 
{x" : n =0,1,2....}isatotal system of functions in the Hilbert space L?({a, b], dx). 
It follows that every element f € L7({a,b], dx) is the limit of a sequence of polyno- 
mials, in the L?-norm. Compare this with the Theorem of Stone—Weierstrass which 
says that every continuous function on [a, b] is the uniform limit of a sequence of 
polynomials. 

For the special case of the interval J = [—1, 1] the Gram—Schmidt orthonormal- 
ization of the system of powers leads to a well-known system of polynomials. 


Theorem 17.7 The system of Legendre polynomials 


1 d \"” 
P,(x) = (x? 1)", xe[-1,l], n=0,1,2,... (17.7) 
2"n! \ dx 


is an orthogonal basis of the Hilbert space L? ({[—1,1], dx). The Legendre 
polynomials are normalized according to the relation 


2 


Pas Pm = A aa 
Fl 


Sam -. 


Again one can show that these polynomials satisfy a recursion relation and a second 
order differential equation (see Exercises): 


(n + 1) Pa4i(e) — 2n + 1)x Pa(x) + 2 Pri) = 0, (17.8) 
(1 — x*)y"” — 2xy’ + n(n + l)y = 0, (17.9) 


where y = P,,(x). 


17 Separable Hilbert Spaces 


252 


| 
\ | 
)'l 
Ps Ps 
“. as ijl 
\ #| N74 in KOI™N Loh, if 
y 
les a ’ ‘ ity 
/ X 7 \ ‘yy 
ATTY “7 \ ‘ f 
0) Z \ \V i Hy 
\ 
\, \ / Wak ‘ \ / [i 
l Y Vs \ \ y |p 
i I N f ‘Y ‘| i 
N 
ml <\_ ® 7] 
Lil \ 7| Sk f of 
rl ae 
I 
a 
Tt 
I 
a 
I 
ty 
UI 
] 
ly 
0 1 


Fig. 17.1 Legendre polynomials P3, Ps, Ps 


Without further details we mention the weight functions for some other systems 
of orthogonal polynomials on the interval [—1, 1]: 
p(x) = (1- x)", 
plz) = (1-22, 


Jacobi P,’” vyu>—l, 
Gegenbauer C* 14> -1/2, 
Tschebyschew Ist kind 


Tschebyschew 2nd kind p(x) = (1 — x”). 


p(x) =(l—-x)"?, 


We conclude this section by an illustration of the Knotensatz for some Legendre 
polynomials of low order. This graph clearly shows that the zeros of the polynomial 
Py are separated by the zeros of the polynomial Py,;, k = 3,4. In addition the 
orthonormal polynomials are listed explicitly up to order n = 6. 


17.4 Exercises 253 


17.4 Exercises 


1. Prove: A Hilbert space 1 is separable if, and only if, H contains a countable 
dense subset. 

2. The space of almost-periodic functions: In the space of complex-valued mea- 
surable functions on R consider the vector space F’ which is generated by the 
exponential functions e,, A € R; here e, : R-C is defined by e,(x) = elt} 
for all x € R. Thus elements g in F are of the form g = a aye, for some 
choice of N € N, aq € C, and A; € R. On F we define 


. 1 a ne 
(ef) = fim 5 [BG FCayas, 


a) Show that (-, -) defines an inner product on F. 

b) Complete the inner product space (F, (-,-)) to get a Hilbert space Hy), called 
the space of almost periodic functions on R. 

c) Show that #/,, is not separable. 

Hints: Show that fe, : A € R} is an orthonormal system in Hg, which is not 
countable. 

3. Consider the functions e,, 1 € Z, defined on the interval [0,27] by e,(x) = 
ae. Prove: This system is an orthonormal basis of the Hilbert space 
L?((0, 2x], dx). 

4. Prove that the function F in Lemma 17.1 is well-defined and holomorphic in the 
strip Sy. 

Hints: For p = u+ ive S, write i px = a|x| + ixu — |x|(@ + vsign x), group 
terms appropriately and estimate. 

5. Let p bea weight function on the interval 7. Show: The Hilbert spaces L*(/, pdx) 
and L*(J, dx) are isomorphic under the map f t> JSPs. 

6. Let Py, k = 0,1,2,..., be the system of orthonormal polynomials for the interval 
I and the weight function p. Then the polynomial Q;(x, A) = Pr(x) +A P_-i(x) 
has k simple real roots, for any A € R. 

7. Under the assumptions of the previous problem show: The functional 


b n 
fw = i (u(x) — Yo ay Pex) p(x) dx 
a k=0 
is minimized by the choice a, = (Px,u),, k = 0,1,...,n. Here u is a given 
continuous function. 

8. Forn = 0, 1, 2,3, 4 calculate the Hermite functions h,,, the Laguerre functions ¢,,, 
and the Legendre polynomials P,, explicitly in two ways, first by going through 
the Gram—Schmidt orthonormalization and then by using the representation of 
these functions in terms of differentiation of the generating functions given in 
the last section. 

9. Show that the Hermite polynomials have the generating function e 


oo {* 
2xt—1? 
e = y A, (x) nl 


n=0 


Qxt—1? ; 
, Le., 


254 17 Separable Hilbert Spaces 


Table 17.1 Orthogonal polynomials of order < 6 


n | Ay(x) L(x) Pax) 

1 | 2x ea x 

2 | 4x?-2 1—2x + 3x? 3x? 

3 | 8x3 — 12x 1—3x+ 3x? am zx? 3x3 = 3x 

4 | 16x4 — 48x? + 12 1—4x+4+3x?— 2x34 dxf | Bxt_- By? 4 3 

5 | 32x — 160x3 + 120x 1 — Sx + 5x? — 3x3 8x5 — 8x34 8 
+3x4 = me 

6 | 64x° — 480x4+720x? — 120 | 1 = ae ; Wxrpixt | Brye — Bs yt4 WS 2 _ 5 
—a9X° + 399% 


10. Use the last result to show that the Hermite functions are eigenfunctions of 
the Fourier transform: F(h,)(p) = (—i)"hy(p) n = 0,1,2.... (See also 
Proposition 24.2.2). 

11. Prove the recursion relations (17.2), (17.5), and (17.8). 

12. Prove the differential equations (17.3), (17.6), and (17.9) by using the rep- 
resentation of these functions in terms of differentiation of the generating 
functions. 


References 


1. Edwards RE. Fourier-series. A modern introduction. vol. 1. 2nd ed. New York: Springer-Verlag; 
1979. 

2. Amrein WO. Non-relativistic quantum dynamics. Dordrecht: Reidel; 1981. 

3. Galindo A, Pascual P. Quantum mechanics I. Texts and Monographs in Physics. Berlin: Springer- 
Verlag; 1990. 

4. Thirring W. A course in mathematical physics : classical dynamical systems and classical field 
theory. Springer study edition. New York: Springer-Verlag; 1992. 


Chapter 18 
Direct Sums and Tensor Products 


There are two often used constructions of forming new Hilbert spaces out of a finite 
or infinite set of given Hilbert spaces. Both constructions are quite important in 
quantum mechanics and in quantum field theory. This brief chapter introduces these 
constructions and discusses some examples from physics. 


18.1 Direct Sums of Hilbert Spaces 


Recall the construction of the first Hilbert space by D. Hilbert, the space of square 
summable sequences ¢7(IK) over the field K. Here we take infinitely many copies 
of the Hilbert space K and take from each copy an element to form a sequence of 
elements and define this space as the space of all those sequences for which the 
square of the norm of these elements forms a summable sequence of real numbers. 
This construction will be generalized by replacing the infinitely many copies of 
the Hilbert space K by a countable set of given Hilbert spaces and do the same 
construction. 

Let us first explain the construction of the direct sum of a finite number of Hilbert 
spaces. Suppose we are given two Hilbert spaces H and H, over the same field K. 
Consider the set H; x Hz of ordered pairs (x1, x2), x; € H; of elements in these 
spaces and equip this set in a natural way with the structure of a vector space over 
the field K. To this end, one defines addition and scalar multiplication on H; x H2 
as follows: 


(x1,X2) + 1, y2) = (41 + 1, X2 + yr) Vxi,yi € Hi, i = 1,2, 
A+ (X1,X2) = (Ax), Ax2) Vx; €H;, VAEK. 


It is straightforward to show that with this addition and scalar multiplication the set 
H, x Hz is a vector space over the field K. Next we define a scalar product on this 
vector space. If (-, -); denotes the inner product of the Hilbert space H;, i = 1,2, one 


© Springer International Publishing Switzerland 2015 295 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_18 


256 18 Direct Sums and Tensor Products 
defines an inner product (-,-) on the vector space H, x H by 


((%1, 2), V1, Y2)) = (%1, Vi) + (x2, Y2)2 Vxi,yi € Hi, i = 1,2. 


In the exercises one is asked to verify that this expression defines indeed an inner 
product on H; x Hp. In another exercise it is shown that the resulting inner product 
space is complete and thus a Hilbert space. This Hilbert space is denoted by H; ® H2 
and is called the direct sum of the Hilbert spaces H, and Hz. 

Now assume that a countable set H;, i € N, of Hilbert spaces over the same field 
K is given. Consider the set 1 of all sequences x = (x;)jen with x; € H; for all 
i € N such that 


o.e) 
Y= Ilxill? < 00, (18.1) 
i=l 

where || - ||; denotes the norm of the Hilbert space /;. On this set of all such sequences 


the structure of a vector space over the field K is introduced in a natural way by 
defining addition and scalar multiplication as follows: 


(x Dien + ODien = i + Wien Vxivi € Hi, 1 EN, (18.2) 
+ &Dien = AXi)ien Vx, EH), i EN. (18.3) 


It is again an easy exercise to show that with this addition and scalar multiplication 
the set is indeed a vector space over the field K. If (-, -); denotes the inner product 
of the Hilbert spaces H;, i € N, an inner product on the vector space H is defined by 


(QDien, Wien) = Yai vii Vien, Odien € H. (18.4) 
i=l 


The proof is left as an exercise. Equipped with this inner product, # is an inner 
product space. The following theorem states that H is complete, and thus a Hilbert 
space. 


Theorem 18.1 Suppose that a countable set of Hilbert spaces H;, i € N, over the 
field K is given. On the set H ofall sequences x = (x;)jen satisfying condition (18.1), 
define a vector space structure by relations (18.2), (18.3) and an inner product by 
relation (18.4). Then H is a Hilbert space over K, called the Hilbert sum or direct 
sum of the Hilbert spaces H;, i € N, and is denoted by 


H = @X,Hi. (18.5) 


Tf all the Hilbert spaces ‘H;, i € N are separable, then the direct sum H is separable 
too. 


Proof Only the proofs of completeness and of separability of the inner product 
space are left. In its main steps the proof of completeness is the same as the proof of 
completeness of the sequence space ¢7(K) given earlier. 


18.1 Direct Sums of Hilbert Spaces 257 


Given a Cauchy sequence (x), <x in H and any ¢ > 0, there is an ng € N such 
that 
|x —x™ |] <e Vn,m > no. 


Each element x of this sequence is itself a sequence Ge ier. Thus, in terms of 
the inner product (18.4), this Cauchy condition means 


[oe 
Sola xP <6? Vnsm > no. (18.6) 
i=1 


It follows that for every i € N the sequence ae Jnen is actually a Cauchy sequence 
in the Hilbert space H/; and thus converges to a unique element x; in this space: 


x,=limx” VieN. 


Condition (18.6) implies, for every L € N, 


L 
So xe? = xf 7 <6? Vaym > no, (18.7) 


i=1 


and thus, by taking the limit n— oo in this estimate, it follows that 


L 
Se i xf" 7 Se? Vm = no. (18.8) 


i=1 


This estimate holds for all L € N and the bound is independent of L. Therefore it 
also holds in the limit L— oo (which obviously exists) 


CO 
Se i — xf" 7 Se? Vm = no. (18.9) 


i=1 


Introducing the sequence x = (x;);en of limit elements x; of the sequence x” \nen 
estimate (18.9) reads 
Ix-x™||<e Wm>no. 


Therefore, for any fixed m > ng, ||x|| < |lx — x |] + |x || < e+ |x ||, and 
it follows that the sequence x is square summable, i.e., x € H, and that the given 
Cauchy sequence (x"?),<x converges in H to x. Thus the inner product space H is 
complete. 

The proof of separability is left as an exercise. 


258 18 Direct Sums and Tensor Products 


18.2 Tensor Products 


Tensor products of Hilbert spaces are an essential tool in the description of multipar- 
ticle systems in quantum mechanics and in relativistic quantum field theory. There 
are several other areas in physics where tensor products, not only of Hilbert spaces 
but of vector spaces in general, play a prominent role. Certainly, in various areas of 
mathematics, the concept of tensor product is essential. Accordingly, we begin this 
section with a brief reminder of the tensor product of vector spaces and then discuss 
the special aspects of the tensor product of Hilbert spaces. 

Given two vector spaces E and F over the same field K, introduce the vector 
space A = A(E, F) of all linear combinations 


N 
De aia aj € K, x; € E, y, €F, j=l,...,NEN 
j=l 


of ordered pairs (x, y) € E x F. Consider the following four types of elements of a 
special form in A: 

(x, y1 + yo) — (, v1) — (%, y2) xek, y,2EF 

(x1 + x2, y) — (41, y) — (2, y) X1,%2€ E,yeF 

(Ax, y) — A(x, y) xeE, yeF,~AeK 

(x, Ay) — A, y) xe€E, yeF,rAEK. 
These special elements generate a linear subspace Ag C A. The quotient space of 


A with respect to this subspace Ag is called the tensor product of E and F and is 
denoted by E ® F: 


E@F= ACE, F)/Ao. (18.10) 


By construction, E x F is a subspace of A(E, F); the restriction of the quotient map 
Q: A(E, F) > ACE, F)/Ap to this subspace (£, F’) is denoted by x and the image 
of an element (x, y) € (E, F) under x is accordingly called the tensor product of x 
and y, 

XQ, YI =xXOY. 


The calculation rules of the tensor product are 


XOOr+ y2) =X @OW+x@y2 xEF, yj,y.€ F (18.11) 
(xj +x.) @y=x Byt+xn@y X1,%2 EE, yeF (18.12) 
(Ax) @ y =x @ y) xE€F, yeF,rzAEK (18.13) 
x @ (Ay) = Ax @ y) xE€F, yeF,AEK. (18.14) 


The proof of these rules is left as an exercise. 


18.2 Tensor Products 259 


The important role of the tensor product in analysis comes from the following 
(universal) property, which roughly says that through the tensor product one can 
“linearize” bilinear maps. 


Theorem 18.2 Let E,F,G be vector spaces over the field 1K. Then, for every 
bilinear map b: E x F — G, there is a linear map €: E ® F — G such that 


b(x,y)=£lo0 x(x, y) = &(x @ y) VxeE,yeF. 


Proof The bilinear map b : E x F—G has a natural extension B : A(E, F)>G 
defined by B(*®, ai(xi, yi)) = 2”, aib(i, yi). By definition B is linear. It is a 
small exercise to show that bilinearity of b implies B(t) = 0 for all t € Ag. This 
allows us to define a linear map £ : A(E, F)/Aop—G by £ 0 Q(t) = B(t) for all 
t € A(E, F). (Q denotes again the quotient map.) Thus, for all (x, y) € E x F, one 
has £0 x(x, y) = B(x, y) = bx, y). 

In the first part on distribution theory, we introduced the tensor product of test 
function spaces and of distributions, for instance the tensor product D(Q,) @ D(22) 
for 2; C R”, i = 1,2, open and nonempty, in a direct way by defining, for all 
fi € D(Q;), the tensor product f; ® fo as a function 2, x 2, — K with values 
Si © fox1, x2) = fix) fox) for all (41, x2) € 921 x 822. That this is a special case 
of the general construction given above is shown in the exercises. 

Now, given two Hilbert spaces H;, i = 1,2, we know what the algebraic tensor 
product H; ® Hz of the two vector spaces H and Hz is. If (-,-); denotes the inner 
product of the Hilbert space H;, we introduce on the vector space H; ® H2, the inner 
product 


(x1 @ X2, V1 @ yo) = (X1, Vi) 1 (x2, yo)2 Vxi.yi € Hi, i = 1,2. (18.15) 


Using the calculation rules of tensor products, this definition is extended to generic 
elements of the vector space H; ® H2, and in the exercises we show that this defines 
indeed an inner product. 

In general the inner product space (H; ® Hz, (-,-)) is not complete. However, 
according to the Corollary A.1, the completion of an inner product space is a Hilbert 
space. This completion H;@H2 is called the tensor product of the Hilbert spaces 
H, and H2 and is usually denoted as 


H, @ Ho. 


Note that in this notation the symbol ~ for the completion has been omitted. 

For separable Hilbert spaces there is a direct construction of the tensor product 
in terms of an orthonormal basis. Suppose that {u; : i €¢ N} is an orthonormal basis 
of the Hilbert space #1; and {v; : i € N} an orthonormal basis of #12. Now consider 
the system S = {(ui. vii, J € N} CH, x Hz. This system is orthonormal with 


260 18 Direct Sums and Tensor Products 
respect to the inner product (18.15): 
(Ui, Vj)s (Up, Vq)) = (uj, Up) 1 (Vj, Vq)2 = 5ip9 jq Vi, J; DoqVeé N. 


The idea now is to define the tensor product H; ® H2 as the Hilbert space in which 
the system S is an orthonormal basis, i.e., 


lo-e) [o.@) 
Hi @ Ho = 4T = > aij(uj.vj): aij €K, > aij? <cof. — (18.16) 


ij=l i,j=l 


For two elements 7), 7, € H; ® H2 with coefficients a;; and b;;, respectively, it 
follows easily that 


o.e) 
(1), Th) = y aij dij, 
i,j=l 
as one would expect. According to this construction, the tensor product of two 
separable Hilbert spaces is separable. 
For every x € H, and y € Hy, one has x = ali with aj = 
(uj,x),; and y = jet BY) with b; = (vj, y)2, and thus ((u;,v;),(x,y)) = 
(uj,X)1 (Vj, y)2 = a;b;. Therefore the standard factorization follows: 


[o.2) CO 
Yo Muiv). Gy»)? = DS lail?lojP = bx lilly li. 
i,j=l i,j=l 


By identifying the elements (u;, v;) with u; ® v;, one can show that this construction 
leads to the same result as the general construction of the tensor product of two 
Hilbert spaces. 

Without much additional effort the construction of the tensor product generalizes 
to more than two factors. Thus, given a finite number of vector spaces E),... , Ey, 
over the field K, the n-fold tensor product 


E,@:::-@E, 


is well defined and has similar properties as the tensor product of two vector spaces. 
In particular, to any n-linear map b : FE, x --- x E,—> G into some vector space over 
the same field, there is a linear map @ : E; ®--- ® E,—G such that 


D(xX1,... ,Xn) = C(x] @ +++ @ Xp) Vx; € Ej, i=1,...,n. 
This applies in particular to the n-fold tensor product 
Hy @---®@ Hn 


of Hilbert spaces H;,i = 1,... ,n. 


18.3. Some Applications of Tensor Products and Direct Sums 261 
18.3 Some Applications of Tensor Products and Direct Sums 


18.3.1 State Space of Particles with Spin 


Originally, in quantum physics the state space (more precisely the space of wave 
functions) H for an elementary localizable particle was considered to be the Hilbert 
space of complex valued square integrable functions in configuration space R’, i.e., 
H = L?(R°). Initially this state space was also used for the quantum mechanical 
description of an electron. Later through several experiments (Stern—Gerlach, Zee- 
man) one learned that the electron has an additional internal degree of freedom with 
two possible values. This internal degree of freedom is called spin. Hence the state 
space for the electron had to be extended by these two additional degrees of freedom 
and accordingly the state space of the electron is taken to be 


H, = L’(R°) @ C? = L?(R?, CC’). (18.17) 


Note that L7(IR*, C’) is the Hilbert space of all square integrable functions yy: R?—> C? 
with inner product (w,@) = ear = wj(x)b;(x)dx for all wy, € L?(R3,C?). 

Later, other elementary particles were discovered with p > 2 internal degrees of 
freedom. Accordingly, their state space was taken to be 


L?(R*) ® C? = L?(R?,C’). 


The validity of this identity is shown in the exercises. 

Actually the theory of these internal degrees of freedom or spins is closely related 
to the representation theory of the group SU(2) (see [1]). C? is the representation 
space of the irreducible representation D,/2 of SU(2) and similarly, C+! is the 
representation space of the irreducible representation D, of SU(2), s = n/2,n = 
OL 1 2305 


18.3.2 State Space of Multi Particle Quantum Systems 


In the quantum mechanical description of multiparticle systems, the question nat- 
urally arises of how the states of the multiparticle system are related to the single 
particle states of the particles that constitute the multiparticle system. The answer is 
given by the tensor product of Hilbert spaces. According to the principles of quantum 
mechanics, the state space H,, of an n-particle system of n identical particles with 
state space H, is 


H, =H, ®@---@H; n factors, (18.18) 


or a certain subspace thereof depending on the type of particle. 
Empirically one found that there are two types of particles, bosons and fermions. 
The spin of bosons has an integer value s = 0,1,2,... while fermions have a spin 


262 18 Direct Sums and Tensor Products 
with half-integer values, i.e., 5 = 5. 3, 3, .... Then-particle state space of n identical 
bosons is the totally symmetric n-fold tensor product of the one particle state space, 
1.€., 


Hnp = Hi @s-+- @s Hi n factors, (18.19) 


and the n-particle state space of n identical fermions is the totally antisymmetric 
tensor product of the one particle state space, i.e., 


Hn, = Hi @a+-: @a Hi n factors. (18.20) 


Here we use the following notation: ¢ @, w= 5(p OWv+V¥O),o¢@,Vv= 5(b ® 
w — wv @ @), respectively. In the exercises some concrete examples of multiparticle 
state spaces are studied. 

In the relativistic quantum field theory, one considers systems in which elementary 
particles can be created and annihilated. Thus one needs a state space that allows the 
description of any number of particles and that allows a change of particle numbers. 

Suppose we consider such a system composed of bosons with one particle state 
space H. Then the Boson Fock space over H 


Hep = Or pHnp 


where Ho, = C, and H,, is given in (18.19), is a Hilbert space, which allows the 
description of a varying number of bosons. 
Similarly, the Fermion Fock space over the one particle state space H, 


Hp = Bron, 


where again Ho, = C, and H,, ¢ is given in (18.20), is a Hilbert space, which allows 
the description of a varying number of fermions. 

We conclude this chapter with the remark that in relativistic quantum field theory 
one can explain, on the basis of well established physical principles, why the n- 
particle space of bosons has to be a totally symmetric and that of fermions a totally 
antisymmetric tensor product of their one particle state space (for a theorem on the 
connection between spin and statistics, see [1—4]). 


18.4 Exercises 


1. Prove: Through formula (18.15) a scalar product is well defined on the tensor 
product H; ® H2 of two Hilbert spaces H;, i = 1,2. 
2. Complete the proof of Theorem 18.1, i.e., show: If all the Hilbert spaces ,;, 
i €N, are separable, so is the direct sum H = O72, Hi. 
. Prove the calculation rules for tensor products. 
4. Show that the definition of the tensor product D({2|) ® D(22) of test function 
spaces D(§2;) is a special case of the tensor product of vector spaces. 


WwW 


References 263 


5. Prove the statements in the text about the n-fold tensor product for n > 2. 
6. On the Hilbert space C? consider the matrices 


j=) 
= 
oO 
I 
_ 
— 
j=) 


Oy = ‘ Oy = ‘i Oo, = ; (18.21) 
1 i 1 


ro) 

- 

° 

co) 
| 


Show that these matrices are self-adjoint on C?, ie., o* = o (for the definition 
of the adjoint o*, see the beginning of Sect. 19.2) and satisfy the relations 


O,0y = —Oy0O, = 10, Oy, = —0,0, = 10, 0,0, = —O0x0, = 10y. 


(18.22) 


The matrices o,, oy, 0, are called the Pauli matrices. In quantum physics they are 
used for the description of the spin of a particle. 


. Show that the 3 Pauli matrices together with the two dimensional unit matrix Jy 


form a basis of the space M2(C) of all 2 x 2 and deduce that every self-adjoint A* = 
A € M,(C) can be represented in the form A = Alp ++ a-o witha = Tr(A) ER 
and a = (a,,dy,a;) € R? with the convention a-o = ayo, + AyOy + A,0z. 
Finally deduce that every projector on C’, i.e., every E € M>(C) that satisfies 
E* = E = E” has the representation 


I 
E=x(hte-o), e€ R’, lel| =1. (18.23) 


References 


Thirring W. Quantum mathematical physics - atoms, molecules and large systems. Vol. 3. 
Heidelberg: Springer-Verlag; 2002. 

Reed M, Simon B. Fourier analysis. Self-adjointness. Methods of modern mathematical physics. 
Vol. 2. New York: Academic; 1975. 

Jost R. The general theory of quantized fields. Providence: American Mathematical Society; 
1965. 

Streater RF, Wightman AS. PCT, spin and statistics, and all that. New York: Benjamin; 1964. 


Chapter 19 
Topological Aspects 


In our introduction, we stressed the analogy between Euclidean spaces and Hilbert 
spaces. This analogy works well as long as only the vector space and the geometric 
structures of a Hilbert space are concerned. But in the case of infinite dimensional 
Hilbert spaces there are essential differences when we look at topological structures 
on these spaces. It turns out that in an infinite dimensional Hilbert space the unit ball is 
not compact (with respect to the natural or norm topology) with the consequence that 
in such a case there are very few compact sets of interest for analysis. Accordingly 
a weaker topology in which the closed unit ball is compact is of great importance. 
This topology, called the weak topology, is studied in the second section to the extent 
needed in later chapters. 


19.1 Compactness 


We begin by recalling some basic concepts related to compact sets. If M is a subset 
of a normed space X, a system G of subsets G of X is called a covering of M if, 
and only if, M C UgegG. If all the sets in G are open, such a covering is called an 
open covering of M. A subset K of X is called compact if, and only if, every open 
covering of K contains a finite subcovering, i.e., there are G},... ,Gy € G such 
that K C UN Gi. 

It is important to be aware of the following basic facts about compact sets. A 
compact set K C X is closed and bounded in the normed space (X, || - ||). A closed 
subset of a compact set is compact. 

Every infinite sequence (x, ),<n in a compact set K contains a subsequence, which 
converges to a point in K (Theorem of Bolzano—Weierstrass). If K is a set such that 
every infinite sequence in K has a convergent subsequence, then K is called sequen- 
tially compact. One shows (see Exercises) that in a normed space a set is compact if, 
and only if, it is sequentially compact. This is very convenient in applications and is 
used frequently. B. Bolzano was the first to point out the significance of this property 
for a rigorous introduction to analysis. 


© Springer International Publishing Switzerland 2015 265 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_19 


266 19 Topological Aspects 


A continuous real valued function is bounded on a compact set, attains its minimal 
and maximal values (Theorem of Weierstrass), and is equi-continuous (Theorem of 
Heine). 

The covering theorem of Heine—Borel states that a subset K C K” is compact 
if, and only if, it is closed and bounded. In infinite dimensional normed spaces, this 
equivalence is not true as the following important theorem shows: 


Theorem 19.1 (Theorem of F. Riesz) Suppose (X, || - ||) is a normed space and 
B = B,(0) denotes the closed unit ball with centre 0. Then B is compact if, and only 
if, X is finite dimensional. 


Proof If X is finite dimensional, then B is compact because of the Heine—Borel 
covering theorem. 

Conversely, assume that B is a compact ion in the normed space (X, || - ||). The 
open ball is denoted by B(a,r) with centre a € X and radius r > 0. Then G = 
{B(a,r): a € B}is an open covering of B for any r > 0. Compactness of B implies 
that there is a finite subcover, i.e., there are points aj,... ,ay € B such that 


BCU, BG@,r). (19.1) 


Now, observe B(a;,r) = a; + rB(O, 1) and denoted by V the linear subspace of X 
generated by the vectors a;,... ,ay. Certainly, V has a dimension smaller than or 
equal to N and is thus closed in X. Relation (19.1) implies 


BCU (a +rBO,1) CV +rB0,1) CV 4+ rB. (19.2) 
By iterating this relation, we obtain, forn = 1,2,... 
BCV-+r"B. (19.3) 
Choose 0 < r < 1. It follows that 
BCMen(V +r"B)=V=V. 
Since B is the closed unit ball of X, we know X = UP. .nB and thus 


X CUM nV=V. 


Therefore, X has a finite dimension smaller than or equal to NV. 

For an infinite dimensional Hilbert space, there is another proof of the fact that 
its closed unit ball is not compact. For such a Hilbert space, one can find an or- 
thonormal system with infinitely many elements: {e, :n € N} C B. Forn,m €N, 
n #™m one has |le, — em|| = /2. Thus no subsequence of the sequence (é,)nen 
is a Cauchy sequence; therefore no subsequence converges and hence B is not 
sequentially compact. 


Remark 19.1 An obvious consequence of Theorem 19.1 is that in an infinite dimen- 
sional normed space X, compact sets have an empty interior. Hence in such a case, 
the only continuous function f : X—>K with compact support is the null function. 
Recall that a space is called locally compact if, and only if, every point has a 
compact neighborhood. Hence, a locally compact normed space is finite dimensional. 


19.2 The Weak Topology 267 


19.2 The Weak Topology 


As the theorem of F. Riesz shows, the closed unit ball in an infinite dimensional 
Hilbert space H is not (sequentially) compact. We are going to introduce a weaker 
topology on H with respect to which the convenient characterization of compact sets 
as we know it from the Euclidean spaces KK” is available. In particular, the theorem 
of Bolzano—Weierstrass is valid for this weak topology. Though we introduced the 
weak topology in the part on distributions, we repeat it for the present particular case. 


Definition 19.1 Let X be a normed space and X’ its topological dual. The weak 
topology on X, o(X, X"), is the coarsest locally convex topology on X such that all 
f € X’ are continuous. A basis of neighborhoods of a point x9 € X for the topology 
o(X, X') is given by the following system of sets: 


U(x03 fi.--- > Sait), Sisson EX, r>0, néN, 
U(x03 fis... > fasr) = {x © X: | fix —x0)| <r, i=l,...,n}. 


In particular, for a Hilbert space 1, a basis of neighborhoods for the weak topology 
is 
U(x03 Y15--- 5 Yns 1), Vis-+-sJn EH, r>0, neN, 


U(x0; y15--- Yair) ={x EH: |(yi,x —X0)| <r, i=l,...,n}. 


Certainly, Corollary 19.3 has been used in the description of the elements of 
a neighborhood basis for the weak topology of a Hilbert space. It is important to 
be aware of the following elementary facts about the topology 0 = o(X, X’) of a 
normed space X. It has fewer open and thus fewer closed sets than the strong or norm 
topology. Hence, if a subset A C X is closed for o it is also closed for the strong 
topology. But the converse does not hold in general. However, for convex sets, we 
will learn later in this section that such a set is closed for o if, and only if, it is closed 
for the strong topology. 

In case of a finite dimensional normed space X, the weak and the strong topology 
coincide. One can actually show that this property characterizes finite dimensional 
normed spaces. This is discussed in the exercises. 

Though it should be clear from the above definition, we formulate the concepts 
of convergence for the weak topology explicitly. 


Definition 19.2 Let H be a Hilbert space with inner product (-,-) and (%,)nen a 
sequence in H. 


1. The sequence (x, )ncn converges weakly to x € H if, and only if, for every u € H 
the numerical sequence ((u, X,))nen converges to the number (u, x). x is called 
the weak limit of the sequence (x, )nen . 

2. The sequence (X»)nen is a weak Cauchy sequence, i.e., a Cauchy sequence for 
the weak topology, if, and only if, for every u € H the numerical sequence 
((u, Xn) nen 1S a Cauchy sequence. 


268 19 Topological Aspects 


Some immediate consequences of these definitions are: 
Lemma 19.1 Suppose H is a Hilbert space with inner product (-,-). 


(a) A weakly convergent sequence is a weak Cauchy sequence. 
(b) A sequence has at most one weak limit. 
(c) Every infinite orthonormal system converges weakly to zero. 


Proof Part (a) is obvious from the definition. For Part (b) assume that a sequence 
(X%n)nen C H has the points x, y € H as weak limits. For every u € H, it follows 
that 
(u,x — y) = lim (u, Xp — Xn) = 0, 
noo 

and hence x — y € H+ = {0}, thus x = y. 

Suppose {x, : 2 € N} is an infinite orthonormal system in H. For every u € H. 
Bessel’s inequality (see Corollary 15.1) implies that 


(oe) 


2 2 
Yo ln. u)/? < llull? < 00 


n=1 


and therefore (x,,u)—>0. Since u € H is arbitrary we conclude. 
Before we continue with some deeper results about the weak topology on a Hilbert 
space, we would like to pause a little for a heuristic discussion of the intuitive meaning 
of the concept of weak convergence. 
Consider the wave equation in one dimension 


a7u— d2u=0 


where 0, = 2 and similarly 0, = 2 and look for solutions u which are in the 
Hilbert space H = L7(R) with respect to the space variable x for each time f, i-e., 
u(-,t) € L?(R) for each t > 0, given a smooth initial condition uj € C?(R) with 


support in the interval [ — 1, 1] which is symmetric, u9(— x) = uo(x): 
u(-,0) = uo, d,u(-, 0) = 0. 


The solution is easily found to be u(x, t) = 5 (ug(x —t)+uo(x + £)). Obviously, the 
support of u(-, t) is contained in the set S, = [— 1 —¢t,+1—r¢]U[—1+1¢,+1+1¢]. 
For t > 1, the two functions x +» uo(x — t) and x b uo(x + f) have a disjoint 
support and thus for all ¢ > 1, 


1 
Iu, O12 2), lu(x,t)/? dx = © |luoll3- 
R 2 


The support S, of u(-,¢) moves to “infinity” as t— + oo. This implies that u(., ft) 
converges weakly to 0 as t->0o: For every v € L?(R), one finds 


eae visas =f porter | lu(x, 1)|2 dx 
S; St Sr 


19.2 The Weak Topology 269 


< “ [ I(x)? dx. 


Since v € L?(R), given e > 0 there is R > 0 such that eer |v(x)|? dx < €?. For |t| 


sufficiently large the support S, is contained in {x € R: |x| > R}. Hence for such ¢ 
we can continue the above estimate by 


< |luolloe/V2 


and we conclude that (v, u(-, f))2—>0 as |t| > oo. 

The way in which weak convergence is achieved in this example is not atypical 
for weak convergence in L7(R”)! Later in our discussion of quantum mechanical 
scattering theory we will encounter a similar phenomenon. There, scattering states 
in L7(R") will be defined as those functions t > $(-,f) € L?(R”) for which 


lim lp(x, 1)? dx =0 


"> 00 Jx<R 


for every R € (0,00). 

How are strong and weak convergence related? Certainly, if a sequence (X»)nen 
converges strongly to x € H, then it also converges weakly and has the same limit. 
This follows easily from Schwarz’ inequality: |(u,x — xn)| < ||ul|||x — xn||, for any 
u € H. The relation between both concepts of convergence is fully understood as 
the following theorem shows. 


Theorem 19.2 Let H be a Hilbert space with inner product (-,-) and (Xn)nen 4 
sequence in H. This sequence converges strongly tox € H if, and only if, it converges 
weakly to x and limMy-+ 0 ||Xn|| = |x|]. 


Proof That weak convergence is necessary for strong convergence has been shown 
above. The basic estimate for norms 


| Wall — [all |S Ie — xa 


(see Corollary 15.2) implies that lim, —, 69 ||x,|| = ||x|]. 

In order to see that they are sufficient, consider a sequence which converges 
weakly to x € H and for which the sequence of norms converges to the norm of x. 
Since the norm is defined in terms of the inner product, one has 


2 2 2 
|x = Xp lI = (% = Xn, X — Xn) = [XM # Wx” — (%, Xn) — (Xn, x) Vn EN. 


Weak convergence implies that 


lim (x, Xn) = lim (Xn, X) = IIx\I°. 
noo noo 
Since also lim,_, 56 |X» || = |||] is assumed, we deduce ||x — x, ||7—>0 as noo and 


strong convergence follows. 


270 19 Topological Aspects 


There are some simple but important facts implied by the these results. 


e The open unit ball Bj = {x © H: ||x|| < 1} of an infinite dimensional Hilbert 
space H is not open for the weak topology. Since otherwise every set which is 
open for the strong topology would be open for the weak topology and thus both 
topologies would be identical. 

e The unit sphere S$; = {x € H: ||x|| = 1} of an infinite dimensional Hilbert space 
H is closed for the strong but not for the weak topology. The weak closure of Sj, 
i.e., the closure of S; with respect to the weak topology is equal to the closed unit 
ball B; = {x €H: |x|] < 1}. (See exercises) 


A first important step towards showing that the closed unit ball of a Hilbert space 
is compact for the weak topology is to show that strongly bounded sequences have 
weakly convergent subsequences. 


Theorem 19.3 Every sequence (X,)nen in a Hilbert space H which is strongly 
bounded, i.e., there isan M < oo such that ||x,\| < M for alln € N, has a weakly 
convergent subsequence. 


Proof The given sequence generates a closed linear subspace Ho = [{x, : n € N}] 
in H. 

Consider the numerical sequence Al = (X1,%), n = 1,2,.... By Schwarz’ 
inequality it is bounded: |A}| < ||x1||||xnll < M > The Bolzano—Weierstrass theorem 
ensures the existence of a convergent subsequence Al, Ga (X1,Xnl(j)), J € N. Next 


consider the numerical sequence Ae = (X2,Xn(j)), J € N. It too is bounded 


1) 
by M? and again by Bolzano—Weierstrass we can find a convergent subsequence 
Arn y= (x2,Xn2j), J EN. 
This argument can be iterated and thus generates a sequence x,ij), i = 1,2,... 
of subsequences of our original sequence with the property that (x,i+1(;))jen 18 a 
subsequence of (x,i(;))jen- Finally, we consider the diagonal sequence (%n(j)) jen 
where we use m(j) = n/(j). Then all numerical sequences (x;,Xmj)), J € N, 
converge since for j > k this sequence is a subsequence of the convergent sequence 
(Ava) eN It follows that lim j—. 90 (x, Xmj) exists for allx € V = lin {x, : n € N}. 
Hence lim j—96(Xmcj), x) exists for all x € V. We call this limit T(x). Basic rules of 
calculation imply that T : V— Kis linear. The estimate |(x;;),x)| < ||x|I llamcjll < 
M||x|| implies |7(x)| < M||x|| and thus T is a continuous linear functional on the 
subspace V. The Extension Theorem 16.4 implies that there is a unique continuous 
linear functional 7 on H such that ||7'|| = ||T'||. Furthermore, by Theorem 16.3, there 
is aunique vector y € Ho such that T(x) = (y, x) forall x € H, and we deduce that y 
is the weak limit of the sequence (x, ;)) jen (first, we have (y, x) = lim j—.6 (Xm¢j)» X) 
for all x € V, then by continuous extension for all x € H; details are considered in 
the exercises). 
One of the fundamental principles of functional analysis is the uniform bounded- 
ness principle. It is also widely used in the theory of Hilbert spaces. In Appendix 34.4 
we prove this principle in the generality which is needed in the theory of generalized 
functions. In this section we give a direct proof for Banach spaces. This version 
obviously is sufficient for the theory of Hilbert spaces. 


19.2 The Weak Topology 271 


Definition 19.3 Let X be a Banach space with norm || - || and {T, : a € A} a family 
of continuous linear functionals on X (A an arbitrary index set). One says that this 
family is 


1. pointwise bounded if, and only if, for every x € X there is a real constant 
C, < oo such that 
sup |To(x)| < Cx; 
acA 


2. uniformly bounded or norm bounded if, and only if 


sup sup {|Ty(x)| : x € X, ||x|| < 1} =C < oo. 
acA 


Clearly, every uniformly bounded family of continuous linear functionals is point- 
wise bounded. For a certain class of spaces (see Appendix 34.4) the converse is also 
true and is called the principle of uniform boundedness or the uniform boundedness 
principle. It was first proven by Banach and Steinhaus for Banach spaces. 

We prepare for the proof of this fundamental result by an elementary lemma. 


Lemma 19.2. A family {Ty : « € A} of continuous linear functionals on a Banach 
space X is uniformly bounded if, and only if, this family is uniformly bounded on 
some ball B,(xo) = {x € X : ||x — xoll <r}, ie, 


sup sup |7Zy(x)| =C <o. 


acA xEB,(x0) 


Proof Ifthe given family is uniformly bounded we know that there is some positive 
constant Cg such that |T,(x)| < Co for all x € B = B,(0) and alla € A. A ball 
B,(xo) with centre xo and radius r > 0 is obtained from the unit ball B by translation 
and scaling: B,(xo) = x9 +r B. Thus every x € B,(xo) can be written as x = xo +ry 
with y € B and therefore 


|To(x)| = |Ta(xo + ry)| = |Ta(Xo) + rTa(y)| 
S |Ta%o)| + r|Ta(y)| < Collxoll + rCo. 


Hence the family {7, : a € A} is uniformly bounded on the ball B,(x9) by (r + 
Ilx0ll)Co. 

Conversely, assume that the family {7, : a € A} is uniformly bounded on some 
ball B,(xo) with bound C. The points y in the unit ball B have the representation 
y = (x — xo)/r in terms of the points x € B,(xo). It follows, for all y € B and all 
aeéA: 


1 1 2C 
ITa(y)| = 2 [Ta (x — xo)| S ~(lFa()| + |Ta(x0))) < Sr: 


and we conclude. 


Theorem 19.4 (Banach-Steinhaus) A family {T, : a € A} of continuous linear 
functionals on a Banach space X is uniformly bounded if, and only if, it is pointwise 
bounded. 


272 19 Topological Aspects 


Proof Let T = {T, : a € A} be a pointwise bounded family of continuous linear 
functionals on X. We prove the uniform bound 


sup || Ta || < 00 
acA 


indirectly. 

Assume that 7 is not uniformly bounded. Lemma 19.2 implies that 7 is not 
uniformly bounded on any of the balls B,(xo), x9 € X,r > 0. It follows that for 
every p € N there are an index a, € A anda point x, € B = B,(0) such that 
|Tw,(Xp)| > p. 

Begin with p = 1. Since Ty, is continuous there is an ¢, > Osuch that |7,,(x)| > 1 
for all x € B,, (x1). By choosing ¢; small enough we can ensure B,,(x;) C B. Again 
by Lemma 19.2 we know that the family 7 is not uniformly bounded on the ball 
B,,(x,). Hence there are a point x. € B,,(x,) and an index a € A such that 
|To(x2)| > 2. Continuity of T,, implies the existence of ¢2 € (0, ¢)/2) such that 
|Ty(x)| > 2 for all x € Be,(x2) C Be, (x1). 

On the basis of Lemma 19.2 these arguments can be iterated. Thus, we obtain a 
sequence of points (x»)pen C B, a decreasing sequence of positive numbers ¢, and 
a sequence of indices @, € A such that 


(a) |To,(x)| > p for all x € Be, (xp); 
(b) Boy yi pti) Cc Ba, (Xp) for all p € N; 


é € 
(c) 0< pi. < $< H- 


Property (b) implies ||x,+41 — xp|| < €, and thus by c), for all m € N: 


m—1 m—1 
[Xptm — Xpll =] D> @psiga — xpsadll S Do Wxpsigs — xpsill 
i=0 i=0 


m—1 m-1 


€] 
< y Epti < y Apri 
i=0 i=0 


This shows that (x»)pen is a Cauchy sequence in the Banach space X, hence it 
converges to a point x € X. This point belongs to all the balls B., (xp) because of 
(b). At this point x, the family 7 is bounded by assumption. This is a contradiction 
to the construction according to property (a). We conclude that the family 7 is 
uniformly bounded. 


Remark 19.2 


—>0 as p> oo. 


1. The statement of the Banach—Steinhaus theorem can be rephrased as follows: Ifa 
family {T,, : a € A} of continuous linear functionals on a Banach space X is not 
uniformly bounded, then there is a point xy) € X such that sup, <4 |Ty(xo)| = +00. 

2. One can also prove the principle of uniform boundedness by using the fact that a 
Banach space X is a Baire space, i.e., if X is represented as the countable union 
of closed sets X,, X = UnenXn, then at least one of the sets X, must contain 


19.2 The Weak Topology 273 


an open nonempty ball (see Appendix C). Given a pointwise bounded family 
{T, : a € A} of continuous linear functionals of X, we apply this to the sets 


X,={xeEeX: |Ty(x)| <nVae A} neN. 


The pointwise bounds ensure that the union of these sets X,, represents X. It thus 
follows that the family is bounded on some open ball and by Lemma 19.2 we 
conclude. 

3. The theorem of Riesz—Fréchet (Theorem 16.3) states that the continuous linear 
functionals T on a Hilbert space H can be identified with the points u € H: 
T = T,, u € H, T,(x) = (u,x) for all x € H. Theorem 19.4 implies: if a set 
A C His weakly bounded, then it is uniformly bounded, i.e., bounded in norm. 
(See the exercises for details). 

4. In order to verify whether a set A is bounded (i.e., whether A is contained in 
some finite ball) it suffices, because of Theorem 19.4, to verify that it is weakly 
bounded. As in the case of a finite dimensional Hilbert space, this amounts to 
verifying that A is “bounded in every coordinate direction” and this is typically 
much easier. 


A weakly convergent sequence (x,),cn in a Hilbert space H is obviously pointwise 
bounded and thus bounded in norm. This proves 


Lemma 19.3 Every weakly convergent sequence in a Hilbert space is bounded in 
norm. 
Now we are well prepared to prove the second major result of this section. 


Theorem 19.5 Every Hilbert space H is sequentially complete with respect to the 
weak topology. 


Proof Suppose we are given a weak Cauchy sequence (X,))nen C H. For every 
u € H the numerical sequence ((X,,u))nen then is a Cauchy sequence and thus 
converges to some number in the field K. Call this number T(u). It follows that 
this sequence is pointwise bounded. Hence it is norm bounded, i.e., there is some 
constant C € [0, co) such that ||x,|| < C for alln € N. Since T (uw) = limy-so9(Xn, U) 
it follows by Schwarz’ inequality |T(u)| < C||u||. Basic rules of calculation for limits 
imply that the function T : # — Kis linear. Thus 7 is a continuous linear functional 
on H, and we know that such functionals are of the form T = T, for a unique vector 
x €H, T,(u) = (x,u) for all u € H. We conclude that (x, uv) = limy—.o0(Xpn, u) for 
u € H. Hence the Cauchy sequence (x,)n,en converges weakly to the point x € H. 
The Hilbert space H is weakly sequentially complete. 


Theorem 19.6 (Banach-Saks) Suppose that (Xn)nen is a weakly convergent se- 
quence with limit x. Then there exists a subsequence (Xj ;)) jen Such that the sequence 
of arithmetic means of this subsequence converges strongly to x, i.é., 


1 m 
lim — ) Xn(j) = X. 
m>o m * 1 ni) 
J= 


274 19 Topological Aspects 


Proof Since weakly convergent sequences are norm bounded, there is a constant 
M such that ||x — x,|| < M for alln € N. We define the subsequence successively 
and start with n(1) = 1. Due to weak convergence of the given sequence, there is an 
n(2) € N such that | (xn) — x, Xn) —X)| < 1. Suppose that n(1),... ,n(k) have been 
constructed. Since the given sequence converges weakly to x there is ann(k+1)<€N 
such that 


| (Xn(k-+1) X, Xn(i) x)| < ee a eee k. 
Now, we estimate + int Xn(j) — X in norm, taking the choice of the subsequence 


into account in the last step: 


k 


k 
1 1 
Ie De Cnty — IP = yD (nes = 8 nw — *) 


i=l ij=l 


k 
1 
= Yn) — ¥Xnay — X) +2 » (Xn(j) — Xs Xntiy — X) 
i=1 1<i<j<k 
1 k jl 
SB RM + 2D TD Many — nw — 9) 
j=2 i=l 
1 ee a id 
2 s 2 
<q kM +207 = q(kM? + 2k). 


Clearly, the upper bound in this estimate converges to zero as k—>oo and thus 
proves strong convergence of the sequence of arithmetic means. 

An immediate and very important consequence of the theorem of Banach and 
Saks is the conclusion that the weak and the strong closure of convex sets coincide. 


Corollary 19.1 For convex sets A of a Hilbert space H, the strong closure A’ and 
the weak closure A" are the same: 


ACH, Aconvex > A =A’. (19.4) 


Proof Since the strong topology is finer than the weak topology the closure with 
respect to the strong topology is always contained in the closure with respect to the 
weak topology: BOE An Convexity implies that both closures actually agree. 

Recall that a subset A of a vector space is called convex if, and only if, tx + (1 — 
t)y € A whenever x, y € A and 0 < t < 1. Now take any point x € A’. Then there 
is a Sequence (X,)nen C A which converges weakly to x. We can find a subsequence 
(Xnq))ien Such that the sequence (&n)mnen of arithmetic means 


1 m 
Em = oy 2H) 
i= 


19.3. Exercises 275 


converges strongly to x. Since A is convex we know é,, € A for all m € N. Hence x 
is also the limit of a strongly convergent sequence of points in A, thus x € A’ and 
therefore A” C A’. This proves equality of both sets. 


The weak topology o on a Hilbert space H is not metrizable. However, when 
restricted to the closed unit ball, it is metrizable according to the following lemma 
which we mention without proof. 


Lemma 19.4 On closed balls B,(xo) in a Hilbert space, the weak topology o 
induces the topology of a metric space. 

Gathering all the results of this section, the announced compactness of the closed 
unit ball with respect to the weak topology follows easily. 


Theorem 19.7 The closed balls B,.(xo), r > 0, xo € H, ina Hilbert space H, are 
weakly compact. 


Proof Since the weak topology is metrizable on these balls it suffices to show 
sequential compactness. Given any sequence (X,)nen C B,(xo), there is a weakly 
convergent subsequence by Theorem 19.3 and the weak limit of this subsequence 
belongs to the ball B,(xo) because of Corollary 19.1. This proves sequential 
compactness. 


Actually closed balls in any reflexive Banach space, not only in Hilbert spaces, are 
weakly compact. This fact plays a very fundamental réle in optimization problems 
and will be discussed in more detail in Part C. 


19.3. Exercises 


1. Prove: On a finite dimensional-normed space, the weak and the norm topologies 
coincide. And conversely, if the norm and the weak topologies of a normed space 
coincide then this space has a finite dimension. 

2. Fill in the details of the last step in the proof of Theorem 19.3. 

Hints: It suffices to show limj—oo(Xm(j),X) = (y,x) for x € Ho. For x € Ho 
write (y — Xmcj)X) = (Y — Xm(j)s Xe) + (VY — Xmcj)sX — Xe) with a suitable choice 
of x, € V, given any € > 0. 

3. For a subset A of a Hilbert space H prove: If for every x € H there is a finite 
constant C, such that |(u,x)| < C, for all u € A, then there is aconstant C < oo 
such that ||u|| < C for allu € A. 

4. For a bounded linear map T : X > Y between two normed spaces X, Y show: 
a) If B(x,r) denotes the open ballin X of center x and radius r > 0 then one has 


r\|Tl|< sup ||T@)lly. (19.5) 


EE B(x,r) 


276 19 Topological Aspects 


b) Conclude on the basis of (19.5): Given any x € X,r > 0, and0 <t < 1 
there is € € B(x,r) such that 


tr|lT <|IT@)ly . (19.6) 


Hints: Recall the definition of the norm of T: ||T|| = supyeg¢0,1) |T'(@’)|ly and 
conclude r ||T|| = supyeg(o,) IT @)|ly. Next observe that for any x € X one 
has T(x’) = 5(T (x + x’) — T(x — x’). Two straightforward estimates now give 
(19.5). 

5. Give an alternative indirect proof of the Banach—Steinhaus theorem by using 
(19.6) in the basic iteration step. 
Hints: If the given family 7 is not uniformly bounded we can choose a strictly 
increasing sequence of positive numbers R, = R”,n € N, R > 1, and then 
can find a sequence 7, of elements in 7 such that ||7;,|| => R,. Now choose 
a decreasing sequence of positive numbers r, = r”, - < r < 1, such that 
Tr, Ryn —> & asn—> oo. Next construct a sequence of points x,, as follows. Choose 
Xo = O andro = 1. If x0,... ,x~% have been constructed find a point x,41 € 
B(xg, rx) such that tr, || T,|| < || T(xe+1)]|| using (19.6). As in the proof of Theorem 
18.2.3 it follows that this sequence is a Cauchy sequence and thus converges to 
a unique point x € X and one has ||x — x,|lx < Tag Now estimate ||T7,,x|ly 

from below with a suitable choice of t and thus show ||T,,x||y—>oo as n—> 00. 

For instance, one could use R, = 4” andr, = 3~” and t = 2/3 (compare [1]). 


Reference 


1. Sokal AD. A really simple elementary proof of the uniform boundedness theorem. Am Math 
Mon. 2011;118:450-2. 


Chapter 20 
Linear Operators 


For a Hilbert space one can distinguish three structures, namely the linear, the geo- 
metric, and the topological structures. This chapter begins with the study of mappings 
which are compatible with these structures. In this first chapter on linear operators the 
topological structure is not taken into account, and accordingly the operators studied 
in this chapter are not considered to be continuous. Certainly, this will be relevant 
only in the case of infinite-dimensional Hilbert spaces, since on a finite dimensional 
vector space every linear function is continuous. 

Mappings which are compatible with the linear structure are called linear oper- 
ators. The topics of Sect. 20.1 are dedicated to the basic definitions and facts about 
linear operators. Section 20.2 takes the geometrical structure into account insofar as 
consequences of the existence of an inner product are considered. Sect. 20.3 builds on 
the results of Sect. 20.2 and develops the basic theory of a special class of operators 
which play a fundamental role in quantum physics. These studies will be continued 
in later chapters. Finally, Sect. 20.4 discusses some first examples from quantum 
mechanics. 


20.1 Basic Facts 


Recall that any mapping is specified by giving the following data: a domain, a target 
space, and a rule which tells us how to assign to an element in the domain an element 
in the target space. When the domain and the target space carry a linear structure, 
one can consider those mappings which respect these structures. Such mappings are 
called linear. Accordingly one defines linear operators in Hilbert spaces. 


Definition 20.1 Let #/ and K be two Hilbert spaces over the field K. A linear 
operator from H into KC is a mapping A : D(A) — K where D(A) is a linear 
subspace of 1 such that 


A(ayx; + 2x2) = a, A(x1) + a2A(x2) Vx; € D(A), Va; € K, i = 1,2. 


The linear subspace D(A) is called the domain of A. 
If =H, a linear operator A from H into K is called a linear operator in 1. 


© Springer International Publishing Switzerland 2015 277 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_20 


278 20 Linear Operators 


Following tradition we write Ax instead of A(x) for x € D(A) for a linear operator 
A. 

In many studies of linear operators A from H into K, the following two subspaces 
play a distinguished role: The kernel or nullspace N(A) and the range ran A of A: 


N(A) = {x € D(A): Ax = O}, 


ranA = {y €K: y= Ax for some x € D(A)}. 


It is very easy to show that N(A) is a linear subspace of D(A) and ran A is a linear 
subspace of K. 

Recall that a mapping f : D(f) > K, D(f) CH, is called injective if, and only 
if, f(%1) = f(x2), X1,x2 € D(f) implies x; = x2. Thus, a linear operator A from 
H into K is injective if, and only if, its nullspace N(A) is trivial, i.e., N(A) = {0}. 
Similarly, a linear operator A is surjective if, and only if, its range equals the target 
space, i.e.,ranA = K. 

Suppose that A is an injective operator from H into K. Then there is a linear 
operator B from K into H with domain D(B) = ran A and ran B = D(A) such that 
BAx = x for all x € D(A). B is called the inverse operator of A and is usually 
written as A~!. 

Let us consider some simple examples of operators in the Hilbert space H = 
L?({0, 1], dx): First we specify several linear subspaces of H: 


Do ={f :(,1)>C: f continuous, supp f Cc (0, 1}, 
Dua={WeH: b= x"b, 6 EH}, 
Du=|ven:cwenl, 


Here a is some number > |. It is clear that these three sets are actually linear 
subspaces of H = L?({0, 1], dx) and that they are all different: 


Do & Da & Doo. 
As the rule which assigns to an element yw in any of these subspaces an element 
in L7({0, 1], dx), we take the multiplication with +. One checks that indeed +y € 


L?({0, 1], dx) in all three cases and that this assignment is linear. Thus we get three 
different linear operators: 


1 
Ao: D(Ao) = Do, D2rvr Re 
1 
Aq : D(Aq) = Dy, Dy = wv > —y, 
Xx 


1 
Ano: D(An) = Do, Dy BR —-W. 
x 


20.1 Basic Facts 279 


Note that for the multiplication with 1 one cannot have, as a domain, the whole 
Hilbert space L?({0, 1], dx), since the function y = 1 is square-integrable but + -1 
is not square integrable on the interval [0, 1]. 

For a situation as in this example an appropriate terminology is introduced in the 
following definition. 


Definition 20.2 Let H and K be two Hilbert spaces over the field K and A, B two 
linear operators from H into K. B is called an extension of A, in symbols, A € B 
if, and only if, D(A) C D(B) and Ax = Bx for all x € D(A). Then A is also called 
a restriction of B, namely to the subspace D(A) of D(B): A = B| D(A). 

Using this terminology we have for our example: 


Ag C Aw C Ago. 


In the Exercises, further examples of linear operators are discussed. These examples 
and the examples discussed above show a number of features one has to be aware of: 


1. Linear operators from a Hilbert space H into another Hilbert space K are not 
necessarily defined on all of H. The domain as a linear subspace of H is an 
essential part of the definition of a linear operator. 

2. Even if the assignment H > y + Aw makes sense mathematically the vector w 
might not be in the domain of A. Consider for example the case H = K = L?(R) 
and the function w € L?(R), W(x) = i ar and let A stand for the multiplication 
with the function 1 + x”. Then the multiplication of w with this function makes 
good mathematical sense and the result is the function f = 1, but w is not in 
the domain of this multiplication operator since 1 ¢ L?(R). Thus there are linear 
operators which are only defined on a proper subspace of the Hilbert space. 

3. Whether or not a linear operator A can be defined on all of the Hilbert space H 
can be decided by investigating the set 


IAW 
—— :pe D(A), w #0F. 
Iv ll 

If this set is not bounded, the operator A is called an unbounded linear operator. 
These operators are not continuous and in dealing with them special care has to 
be taken. (See later sections). If the above set is bounded the operator A is called 
a bounded linear operator. They respect the topological structure too, since they 
are continuous. 


The fact that the domain of a linear operator is not necessarily equal to the whole 
space causes a number of complications. We mention two. Suppose that A, B are 
two linear operators from the Hilbert space H into the Hilbert space K-. The addition 
of A and B can naturally only be defined on the domain D(A + B) = D(A)N D(B) 
by (A+ B)y = Ay + Bw. However, even if both domains are dense in 1 their 
intersection might be trivial, i.e., D(A) D(B) = {0} and then the resulting definition 
is not of interest. 

Similarly, the natural definition of a product or the composition of two linear 
operators can lead to a trivial result. Suppose A, B are two linear operators in the 


280 20 Linear Operators 


Hilbert space H. Then their product A - B is naturally defined as the composition 
on the domain D(A - B) = {w € D(B): Bw € D(A)} by (A- B)wW = A(B(W)) for 
all y € D(A.- B). But again it can happen that D(A - B) is trivial though D(A) and 
D(B) are dense in H. 

In the next chapter on quadratic forms we will learn how one can improve on 
some of these difficulties. 

We conclude this section with a remark on the importance of the domain of a 
linear operator in a Hilbert space. As we have seen, a linear operator can be usually 
defined on many different domains. Which domain is relevant? This depends on the 
kind of problem in which the linear operator occurs. Large parts of the theory of 
linear operators in Hilbert spaces have been developed in connection with quantum 
mechanics where the linear operators are supposed to represent observables of a 
quantum mechanical system and as such they should be self-adjoint (a property to 
be addressed later). It turns out that typically linear operators are self-adjoint on 
precisely one domain. Also the spectrum of a linear operator depends in a very 
sensitive way on the domain. These statements will become obvious when we have 
developed the corresponding parts of the theory. 


20.2 Adjoints, Closed and Closable Operators 


In the complex Hilbert space H = C” with inner product (x, y) = )°7_, Xiy; for all 
x,y € C”, consider a matrix A = (qj;)j,j=1,..... With complex coefficients. Let us 
calculate, for x, y € C”, 


ae 


(x, Ay) = yea \ eae (A*x, y) 


i,j=l i,j=1 


where we define the adjoint matrix A* by (A*) ji = Gj, i.e. A* = A is the transposed 
complex conjugate matrix. This shows that for any n x n matrix A there exists an 
adjoint matrix A* such that for all x, y € C” one has 


(x, Ay) = (A*x, y). 


Certainly, in case of a linear operator in an infinite-dimensional Hilbert space H 
the elementary calculation in terms of components of the vector is not available. 
Nevertheless, we are going to show that for any densely defined linear operator A in 
a Hilbert space H, there is a unique adjoint operator A* such that 


(x, Ay) = (A*x,y) Vx € D(A*), Vy € D(A) (20.1) 


holds. 


Theorem 20.1 (Existence and Uniqueness of the Adjoint Operator) For every 
densely defined linear operator A in a Hilbert space H there is a unique adjoint 


20.2 Adjoints, Closed and Closable Operators 281 


operator A* such that relation (20.1) holds. The domain of the adjoint is defined as 
D(A*) = {x € H: AC, < 00, |(x, Ay)| < Cellyll Vy € D(A)}. (20.2) 


The adjoint is maximal among all linear operators B which satisfy (Bx, y) = (x, Ay) 
for all x € D(B) and all y € D(A). 


Proof In the Exercises it is shown that the set D(A%*) is indeed a linear subspace 
of H which contains at least the zero element of #1. Take any x € D(A%*) and 
define a function T, : D(A) > C by T,(y) = (x, Ay) for all y € D(A). Linearity 
of A implies that 7), is a linear function and this linear functional is bounded by 
|Tx(y)| < Cx|ly||, since x € D(A*). Thus Theorems 16.3 and 16.4 apply and we get 
a unique element x* € H such that T,.(y) = (x*, y) for all y € D(A). This defines 
an assignment D(A*) 3 x }» x* € H which we denote by A%*, i.e., A*x = x* for 
all x € D(A*). 

By definition of T,, the mapping D(A*) 3 x  T, € H’ is antilinear; by our 
convention the inner product is antilinear in the first argument. We conclude that 
A* : D(A*) > His linear. By construction this linear operator A* satisfies relation 
(20.1) and is called the adjoint of the operator A. 

Suppose that B is a linear operator in H which satisfies 


(Bx, y) = (x, Ay) Vx Ee D(B), Vye D(A). 


It follows immediately that D(B) C D(A*). If x € D(B) is given, take C, = || Bx|| 
for the constant in the definition of D(A*). Therefore, for x € D(B) we have 
(Bx,y) = (A*x, y) for all y € D(A), or Bx — A*x € D(A)+ = {0} since D(A) 
is dense. We conclude that B is a restriction of the adjoint operator A*, B C A”. 
Therefore the adjoint operator A* is the “maximal” operator which satisfies relation 
(20.1). 


Remark 20.1 The assumption in Theorem 20.1 that the operator is densely defined 
is essential for uniqueness of the adjoint. In case this assumption is not satisfied 
one can still define an adjoint, but in many ways. Some details are discussed in the 
Exercises. 

Sometimes it is more convenient to use the equivalent definition (20.6) in the 
Exercises for the domain of the adjoint of a densely defined linear operator. 

As Egs. (20.1) and (20.2) clearly show, the adjoint depends in an essential way on the 
inner product of the Hilbert space. Two different but topologically equivalent inner 
products (i.e., both inner products define the same topology) give rise to two different 
adjoints for a densely defined linear operator. Again, some details are discussed in 
the Exercises. 

There is a simple relation (which is quite useful in many applications) between the 
range of a densely defined linear operator and the null space of its adjoint. The 
relation reads as follows: 


Lemma 20.1 For a densely defined linear operator A in a Hilbert space H. the 
orthogonal complement of the range of A is equal to the nullspace of the adjoint A*: 


(ran A)+ = N(A*). (20.3) 


282 20 Linear Operators 


Proof The proof is simple. y € (ran A)+ if, and only if, 0 = (y,Ax) for all 
x € D(A). This identity implies first that y € D(A*) and then 0 = (A*y, x) for all 
x € D(A). Since D(A) is dense, we deduce y € (ran A)+ if, and only if, A*y = 0, 
ie. y € N(A*). 

As a first straightforward application we state: 


Lemma 20.2 Let A be a densely defined injective linear operator in the Hilbert 
space H. with dense range ran A. Then the adjoint of A has an inverse which is given 
by the adjoint of the inverse of A: 


(A*y! — (A7!y*, 


Proof See Exercises! 

In the definition of the adjoint of a densely defined linear operator A the explicit 
definition (20.2) of the domain plays an important role. Even in concrete examples 
it is not always straightforward to translate this explicit definition into a concrete 
description of the domain of the adjoint, and this in turn has the consequence that 
it is not a simple task to decide when a linear operator is equal to its adjoint, i.e., 
whether the linear operator is self-adjoint. We discuss a relatively simple example 
where we can obtain an explicit description of the domain of the adjoint. 


Example 20.1 In the Hilbert space H = L?({0, 1]) consider the linear operator of 
multiplication with the function x~* for some a > 1/2 on the domain D(A) = 
{f € 17((0,1]): f =xn- 2, g € L*((0,1), ne N} where x, is the characteristic 
function of the subinterval [+, 1], i.e., D(A) consists of those elements in L?([0, 1]) 
which vanish in some neighborhood of zero. It is easy to verify that limy—+oo Xn-8 = 2 
in L?({0, 1]). Hence A is densely defined and thus has a unique adjoint. If g € D(A*), 
then there is some constant such that for alln € N and all f € L?({0, 1)) 


I(g, AXn: f)2l S Cllxn: file. 


Now we use the fact that the multiplication of functions is commutative and obtain 


1 
(8: Axe fla =f BLO) de = (ty ABs fa 
0 
Since obviously ||x, - fll2 < || f|l2 the estimate 


(Xn - Ag, fal S Cll fil 


results for all n € N and all f € L?({0, 1)). It follows that || x, - Ag||2 < C for all 
neéN,iie., 


1 
/ ixe@)Pdx=C? VaeN 
1 
and thus in the limit oo we deduce x~“g € L?({0, 1]) and therefore the explicit 
characterization of the domain of the adjoint reads 


D(A*) = {g € L’((0, 1]) : x “g € L*((0, 1p}. 


20.2 Adjoints, Closed and Closable Operators 283 


In this example the domain of the operator is properly contained in the domain of 
the adjoint. 
Other examples are studied in the Exercises. 

As is well known from analysis, the graph of a function often reveals important 
details. This applies in particular to linear operators and their graphs. Let us recall 
the definition of the graph I’(A) of a linear operator from a Hilbert space H into a 
Hilbert space K, with domain D(A): 


P(A) = {(x,y) eH x K: x € D(A), y = Ax}. (20.4) 


Clearly, J"(A) is a linear subspace of H x K. Itis an important property of the operator 
A whether the graph is closed or not. Accordingly these operators are singled out in 
the following definition. 


Definition 20.3 A linear operator A from H into K is called closed if, and only if, 
its graph I’(A) is a closed subspace of H x K. 

When one has to use the concept of a closed linear operator the following 
characterization is very helpful. 


Theorem 20.2 Let A be a linear operator from a Hilbert space H into a Hilbert 
space K.. The following statements are equivalent. 


a) A is closed. 

b) For every sequence (Xn)nen C D(A) which converges to some x € H and for 
which the sequence of images (AXn)nen converges to some y € K, it follows that 
x € D(A) and y = Ax. 

c) For every sequence (Xn)nex C D(A) which converges weakly to some x € Hand 
for which the sequence of images (AXn)nen converges weakly to some y € K, it 
follows that x € D(A) and y = Ax. 

d) Equipped with the inner product, 


(x, Ax), (y, AY)) = (*, yy) + (Ax, Ay) = (X,Y) A (20.5) 


for (x, Ax), (y, Ay) € (A), the graph I'(A) is a Hilbert space. 


Proof The graph I’(A) is closed if, and only if, every point (x,y) € H x K in 
the closure of [°(A) actually belongs to this graph. And a point (x, y) belongs to 
closure of I°(A) if, and only if, there is a sequence of points (x,, Ax,) € I(A) 
which converges to (x, y) in the Hilbert space 1 x K. The hypothesis in statement 
a) says that we consider a sequence in the graph of A which converges to the point 
(x,y) € H x K. The conclusion in this statement expresses the fact that this limit 
point is a point in the graph of A. Hence statements (a) and (b) are equivalent. 
Since a linear subspace of the Hilbert space H x K is closed if, and only if, it is 
weakly closed, the same reasoning proves the equivalence of statements (a) and (c). 
Finally consider the graph ["(A) as an inner product space with the inner product 
(-,-)4. A Cauchy sequence in this space is a sequence ((x,, AXn))nen C P(A) such 


284 20 Linear Operators 


that for every ¢ > O there is an m9 € N such that 


Ine AXn) — Ons AXmIILA = yf lkn — Xmll3y + WA%n — Atm < Vnym > no. 


Completeness of this inner product space expresses the fact that such a sequence 
converges to a point (x, Ax) € I'(A). 

According to the above Cauchy condition the sequence (x; )nen C D(A) is a 
Cauchy sequence in the Hilbert space and the sequence (Ax,)nen iS a Cauchy 
sequence in the Hilbert space K’. Thus these sequences converge to a point x € H, 
respectively to a point y € K. Now A is closed if, and only if, this limit point 
(x, y) belongs to the graph I(A), i.e., if and only if, this inner product space is 
complete. 

It is instructive to compare the concept of a closed linear operator with that of a 
continuous linear operator. One can think of a closed operator A from 1 into K as 
a “quasicontinuous” operator in the following sense: If a sequence (%,)nen C D(A) 
converges in and if the sequence of images (AX,)nen converges in K, then 


lim Ax, = A( lim xp). 
noo no 


In contrast continuity of A means: Whenever the sequence (x,),en C D(A) con- 
verges in H, the sequence of images (AX,,)nen converges in K and the above relation 
between both limits is satisfied. 

As one would expect, not all linear operators are closed. A simple example of 
such an operator which is not closed is the operator of multiplication with x~° in 
L?({0, 1]) discussed earlier in this section. To see that this operator is not closed take 
the following sequence (f;,)nen C D(A) defined by 


a 
> 


Sn) = 


The sequence of images then is 
1 ? n — 
8n(X) = Afn(x) = ‘ 


Clearly, both sequences have a limit f, respectively g, in L*({0,1]), f(x) = x*, 
g(x) = 1, for all x € [0, 1]. Obviously g = Af, but f ¢ D(A); hence this operator 
is not closed. 

Thus there are linear operators which are not closed. Some of these linear operators 
might have extensions which are closed. This is addressed in the following definition. 


Definition 20.4 A linear operator A from H into K is called closable if, and only 
if, A has an extension B which is a closed linear operator from H into K. 

The closure of a linear operator A, denoted by A, is the smallest closed extension 
of A, if it exists. 


20.2 Adjoints, Closed and Closable Operators 285 


Naturally, in the definition of closure the natural ordering among linear operators is 
used. This means: If B is a closed extension of A, then A C B. 

For densely defined linear operators one has a convenient characterization of those 
operators which are closable as we learn in the following 


Theorem 20.3 (Closability of Densely Defined Operators) Suppose A is a densely 
defined linear operator in the Hilbert space H. It follows that 


a) If B is an extension of A, then the adjoint A* of A is an extension of the adjoint 
B* of B: AC B= B*C A*. 

b) The adjoint A* of A is closed: A* = A*. 

c) Ais closable if, and only if, its adjoint A* is densely defined, and in this case the 
closure of A is equal to the bi-adjoint A** = (A*)* of A: A = A*™. 


Proof Suppose A C B and y € D(B*), i.e., there is a constant C such that 
l(y, Bx)| < C||x|| for all x € D(B). Since D(A) C D(B) and Ax = Bx 
for all x € D(A), we deduce y € D(A%*) (one can use the same constant C) 
and (B*y,x) = (y,Bx) = (y,Ax) = (A*y,x) for all x € D(A). Hence 
B*y — A*y € D(A)* = {0} since D(A) is dense. Therefore D(B*) C D(A*) 
and B*y = A*y for all y € D(B*), i.e., B¥ C A*. 

In order to prove part (b) take any sequence ((y,, A* Yn))nen C I(A*) which 
converges in H. x H to a point (y, z). It follows that, for all x € D(A), 


(y, Ax) = lim (yp, Ax) = lim (A*y,,x) = (z,x), 
noo n— oo 


and we deduce |(y, Ax)| < |[z|l|lx|| for all x € D(A), hence y € D(A*) and 
(z,x) = (y, Ax) = (A*y,x) for all x € D(A) and thus z = A*y, ie., (y,z) = 
(y, A*y) € I’(A*). Therefore A* is closed. 

Finally, for the proof of part (c), observe that a linear operator A is closable if, 
and only if, the closure "(A) of its graph I"(A) is the graph of a linear operator. 
Furthermore, an easy exercise shows that a linear subspace M C H x H is the graph 
of a linear operator in 1 if, and only if, 


OyeM>y=0. 
We know (Corollary 16.1): T(A) = ((A)*)+. Now 
(x,y) € P(A)” > 0 = (x,y), A2))wxn = (2) + (y, Az) Vz € D(A), 
ie., > y € D(A*), x = —A*y. Similarly, 
(u,v) € (P(A)4)t & 0 = (uv), »)) 900 = (us —A*y) + (vy) Vy € D(A), 
ie., > (u, A*y) = (v, y) Vy € D(A*). This shows: (0,v) € P(A) & v € D(A*)+. 
Therefore "(A) is the graph of a linear operator if, and only if, D(A*)+ = {0} and 


thus we conclude by Corollary 16.2 that ["(A) is the graph of a linear operator if, 
and only if, D(A*) is dense. 


286 20 Linear Operators 
Now suppose that A* is densely defined. Then we know that its adjoint (A*)* = 
A** is well defined. The above calculations show 
(u,v) € P(A) @ (u, A*y) = (v,y) Vy € D(A*), 
ie., > ue D(A**), v = A™u, and therefore 
T(A) = F'(A*). 
Since the closure A is defined through the relation I” (A) = T'(A) this proves 
T(A) = r'(A*) 


and the proof is complete. 


20.3. Symmetric and Self-Adjoint Operators 


In the previous section, for densely defined linear operators in a Hilbert space the 
adjoints were defined. In general, one can not compare a densely defined linear 
operator A with its adjoint A*. However there are important classes of such operators 
where such a comparison is possible. If we can compare the operators A and A%*, 
there are two prominent cases to which we direct our attention in this section: (1) 
The adjoint A* is equal to A. (2) The adjoint is an extension of A. These two classes 
of operators are distinguished by proper names according to the 


Definition 20.5 A densely defined linear operator A in a Hilbert space H is called 


a) symmetric if, and only if, A C A*; 

b) self-adjoint if, and only if, A = A*; _ 

c) essentially self-adjoint if, and only if, A is symmetric and its closure A is self- 
adjoint. 


In the definition of an essentially self-adjoint operator we obviously rely on the 
following result. 


Corollary 20.1 A symmetric operator A is always closable. Its closure is the bi- 
adjoint of A: A = A*™ and the closure is symmetric too. 


Proof For asymmetric operator the adjoint is densely defined so that Theorem 20.3 
applies. Since the adjoint is always closed, the relation A C A* implies A C A* and 
hence the closure is symmetric: A = A** C (A)* 

Another simple but useful observation about the relation between closure and 
adjoint is 


Corollary 20.2 Let A be a densely defined linear operator with closure A. Then 
the adjoint of the closure is equal to the adjoint: 


(A)* = A*. 


Proof The simple proof is left as an exercise. 


20.3 Symmetric and Self-Adjoint Operators 287 


Thus for symmetric operators we can assume that they are closed. By definition, 
the closure of an essentially self-adjoint operator is self-adjoint. From the discussion 
above we deduce for such an operator that 


A*=A* =A 
and conversely, if this relation holds for a symmetric operator it is essentially self- 
adjoint. 
Certainly, a self-adjoint operator is essentially self-adjoint (A* = A implies 


A** = A*). However, in general, an essentially self-adjoint operator is not self- 
adjoint, but such an operator has a unique self-adjoint extension, namely its closure. 
The proof is easy: Suppose that B is a self-adjoint extension of A. A C B implies 
first B* C A* and then A** C B**. Since B* = B and A* = A™ we get B = B* C 
A* = A* C B® = B,ie., B= A* =A. 

The importance of the concept of an essentially self-adjoint operator is based on 
the fact that an operator can be essentially self-adjoint on many different domains 
while it is self-adjoint on precisely one domain. The flexibility in the domain of 
an essentially self-adjoint operator is used often in the construction of self-adjoint 
operators, for instance, in quantum mechanics. For differential operators such as 
Schr6dinger operators it is not very difficult to find a dense domain Dp on which 
this operator is symmetric. If one succeeds in showing that the operator is essentially 
self-adjoint on Do, one knows that it has a unique self-adjoint extension, namely its 
closure. This requires that the domain Do is large enough. If one only knows that the 
operator is symmetric on this domain, the problem of constructing all different self- 
adjoint extensions arises. Even more flexibility in the initial choice of the domain is 
assured through the use of the concept of core of a closed operator. 


Definition 20.6 Suppose that A is a closed linear operator. A subset D of the domain 
D(A) of A is called a core of A if, and only if, the closure of the restriction of A to 
the linear subspace lin D is equal to A: Allin D = A. 

It is important to be aware of the fine differences of the various classes of linear 
operators we have introduced. The following table gives a useful survey for a densely 
defined linear operator A. 


Operator A Properties 
Symmetric ACA=A*™C A* 
Closed and symmetric A= A= A** C A* 
Essentially self-adjoint AC A= A** = A* 
Self-adjoint A=A= A* = A*. 
The direct proof of self-adjointness of a given linear operator is usually impossible. 


Fortunately there are several quite general criteria available. Below the two basic 
characterizations of self-adjointness are proven. 


Theorem 20.4 (Self-adjointness) For a symmetric operator A in a Hilbert space 
H, the following statements are equivalent. 


288 20 Linear Operators 


a) Ais self-adjoint: A* = A; 
b) Ais closed and N(A* + il) = {0}; 
c) ran(A+ iJ) =H. 


Proof We proceed with the equivalence proof in the following order: (a) > (b) > 
(c) > (a). 

Suppose A is self-adjoint. Then A is certainly closed. Consider 6, € N(A* + il). 
A* $s = Fid+ implies 


Fi(bs, P+) = (4, A*O2) = (Az, bt) = (A*O4, G4) 


and thus (¢4,¢+) = ||@z||? = 0, i-e., d, = O. 

Next assume that A is a closed symmetric operator such that N(A* + iJ) = {0}. 
Relation (20.3) gives N(A* + z/) = (ran(A + z1)) and therefore ran(A + il) = 
(N(A* + il))+ = H. Hence for the proof of (c) it suffices to show that ran(A + iJ) 
is closed. Suppose x = limy+o0Xn, Xn = (A+ i)yn, yy € D(A), n EN. It is 
straightforward to calculate, for all n,m € N, 


Xn — ae = |\(A£DOn - Ym II = ||AYn — AYm ? + lyn — Yall? 


Therefore, with (x,)nen also, the two sequences (y,)nen and (Ay,)nen are Cauchy 
sequences in the Hilbert space H and thus they converge too, to y, respectively z. 
Since A is closed, y € D(A) and z = Ay, hence x = (A+ i/)y € ran(A+ iJ), 
and this range is closed. Statement (c) follows. 

Finally assume (c). Since A is symmetric it suffices to show that the domain 
of the adjoint is contained in the domain of A. Consider any y € D(A%*), then 
(A* — il)y © H. Hypothesis (c) implies that there is some € € D(A) such that 
(A* — ily = (A— ilDé = (A* — ilé, hence (A* — il)(y — €) = Oor 
y —& € N(A* — it) = (ran(A+ iJ)))* = {0}. This proves y = € € D(A) and 
finally D(A*) = D(A), ie., A* = A. 

The proof of this theorem has also established the following relation between 
the closure of the range of a symmetric operator and the range of the closure of the 
operator: 


+i(Ps, x), 


ran(A + iJ)=ran(Ac iJ). 
Together with Corollary 20.2 this observation implies 


Corollary 20.3 For a symmetric operator A in a Hilbert space H the following 
statements are equivalent: 


a) A is essentially self-adjoint; 
b) N(A* + iI) = {0}; 
c) ran(A + iI) =H. 


In particular one knows for a closed symmetric operator that 
ran(A + il) and ran(A — il) 


are closed linear subspaces of #1. Without proof we mention that a closed symmetric 
operator has self-adjoint extensions if, and only if, the orthogonal complements of 


20.4 Examples 289 


these subspaces have the same dimension: 
dim (ran (A + if)" = dim (ran(A — i7))+ 
or 
dim N(A* + iJ) = dim N(A* — il). 


The main difficulty in applying these criteria for self-adjointness is that one usually 
does not know the explicit form of the adjoint so that it is not obvious at all to check 
whether N(A* + i/) is trivial. Later, in connection with our study of Schrédinger 
operators we will learn how in special cases one can master this difficulty. 


20.4 Examples 


The concepts and the results of the previous three sections are illustrated by several 
examples which are discussed in some detail. 


20.4.1 Operator of Multiplication 


Suppose that g : R” — C is a continuous (but not necessarily bounded) function. 
We want to define the multiplication with g as a linear operator in the Hilbert space 
H. = L?(R"). To this end the natural domain 


D,={f ¢ LVR"): g- fe VR")} 


is introduced. With this domain we denote the operator of multiplication with g by 
Mg, (Mg f)(x) = g(x) f(x) for almost all x € R” and all f € Dg. This operator is 
densely defined since it contains the dense subspace 


Do={xe- ff €L-(R"),r > 0} c LR"). 


Here x, denotes the characteristic function of the closed ball of radius r and center 0. 
The reader is asked to prove this statement as an exercise. As a continuous function, 
g is bounded on the closed ball with radius r, by a constant C, let us say. Thus the 
elementary estimate 


lg xref P= / I@OPIF@P dx < Crlfla Vf e LR), vr >0 
lx|sr 
proves Dy © Dy and the operator M, is densely defined. In order to determine 


the adjoint of M,, take any h € D(M;); then h* = Mzh € L?(R") and for all 
Ff € Dg one has (h, M, f) = (h*, f), in particular for all x, f, f € L?(R"), r > 0, 


290 20 Linear Operators 


(h*, xf) = (h, Mgx, f). Naturally, the multiplication with x, commutes with the 
multiplication with g, thus 


(h, Mex, f) = (h, xrMe f) = (xrh, Mg f) = (x,Mgh, Sf), 


or (x-h*, f) = (x-Mgh, f) for all f ¢€ L?(R") and all r > 0. It follows that 
xrh* = x-gh for all r > O and therefore 


[ i@nmrar= fo woorar sit 

Ix|sr |x|<r 

for allr > 0. We deduce g-h € L7(R") andh* = ¢-h= Mh, hence h € Dz = Dy 
and M; = Mg. This shows that the adjoint of the operator of multiplication with 
the continuous function g is the multiplication with the complex conjugate function 
g, on the same domain. Therefore, this multiplication operator is always closed. In 
particular the operator of multiplication with a real valued continuous function is 
self-adjoint. 

Our arguments are valid not only for continuous functions but for all measurable 
functions g which are bounded on all compact subsets of R”. In this case the operator 
of multiplication with g is the prototype of a self-adjoint operator, as we will learn 
in later chapters. 


20.4.2 Momentum Operator 


As a simple model of the momentum operator in a one-dimensional quantum 
mechanical system we discuss the operator 


_d 
PS 1s 
dx 


in the Hilbert space H = L({0, 1]). Recall that a function fe L?({0, 1]) is called 
absolutely continuous if, and only if, there is a function g € L'((0, 1]) such that for 
all 0 < x9 < x < lL onehas f(x) — f(%o) = es g(y) dy. It follows that f has a 
derivative f’ = g almost everywhere. Initially we are going to use as a domain for 
P the subspace 


D={f €L*((0,1]): f is absolutely continuous, f’ € L*((0, 1)}. 
This subspace is dense in L?({0, 1]) and clearly P is well defined by 
(Pf )(x) = if'(x) for almost all x € [0,1], V f € D. 


For arbitrary f, g € D one has 


1 1 
(f, Pg)2 =| Fx) ig'(x) dx = if(@)g(a)lo - if Sf! a) g(x) dx 


20.4 Examples 291 


= i[fMs() — FOsO)] + (Pf. 8)2. 
Hence P will be symmetric on all domains D’ for which 
FDsd)— fOsO=0 VfgeD' 
holds. These are the subspaces 
D,={feD: fH=e'” FO}, yeR 


as one sees easily. In this way we have obtained a one parameter family of symmetric 
operators 

Py = P\p,, D(P,)= Dy, yeER. 
These operators are all extensions of the symmetric operator P.. = P|Dxo, Do = 
{feD: f)= fO) =. 
Lemma 20.3 For all y € R the symmetric operator Py, is self-adjoint. 
Proof For f € D(Py) we know f* = A € L7({0,1]) c L'({0,1]), hence 
h(x) = i, f*(y) dy +c is absolutely continuous and satisfies h’.(x) = f*(x) 


almost everywhere. Clearly h, belongs to L*({0, 1]), thus A, € D. Now calculate, 
for all g € Dy: 


1 1 
(f, Pyga = (f",8)2 = i hi (x)g(x) dx = he(x)g(x)|o -{ he(x)g' (x) dx 


= [h,()e'” — h(O)]g(0) — (ihe, Pyg)2, or 
(f + ihe, Pyg)2 =[h)e” —h.O)lg@) Vege D,. 


Observe that the subspace {u € L7({0,1]): w= g',g € Dy, g(0)= 0} is dense in 
L?({0, 1]). This implies f(x) + ih,(x) = 0 almost everywhere and thus f € D and 
if’ = h', = f*. From the above identity we now deduce [h,(1) ely — h.(0)]g(0) = 0 
for all g € D,, and it follows that h,(1)e~ 'y — h.(0) = 0, hence f € Dy. Since 
f € D(P;) was arbitrary, this shows that D(P;) = Dy, and P? f = Py f for all 
f € D,. Hence, for every y € R, the operator P,, is self-adjoint. 

We conclude that the operator P,. has a one parameter family of self-adjoint ex- 
tensions P,,, y € R. Our argument shows moreover that every self-adjoint extension 
of P,. is of this form. 


20.4.3 Free Hamilton Operator 


In suitable units the Hamilton operator of a free quantum mechanical particle in 
Euclidean space R? is 

Ho = —A3 
on a suitable domain D(H) C L?(R3). Recall Plancherel’s Theorem 10.7. It says 
that the Fourier transform F) is a “unitary” mapping of the Hilbert space L(R°). 


292 20 Linear Operators 
Theorem 10.3 implies that for all f in 
D(Ho) = | f = Faf e PR): pf e LR?)| = HR) 


one has 7 
Ao f = F2(M pf). 


Here f denotes the inverse Fourier transform of f. Since we know from our first 
example that the operator of multiplication with the real valued function g(p) = p* 
is self-adjoint on the domain { ge LR): pge LVR aE unitarity of Fz implies 
that Hp is self-adjoint on the domain D( Ho) specified above. This will be evident 
when we have studied unitary operators in some details later (Sect. 22.2). 


20.5 Exercises 


1. Let g : RC be abounded continuous function. Denote by M, the multiplication 
of a function f : RC with the function g. Show: M, defines a linear operator 
in the Hilbert space H = L?(R) with domain D(M,) = L?(R). 

2. Denote by D,, the space of all continuous functions f : RC for which 


Pon(f) = sup (1 + x°)"/?| F(x)| 


is finite. Show: Dn4i & Dy, forn = 0,1,2,... andforn = 1,2,... D, isadense 
linear subspace of the Hilbert space = L7(R). Denote by Q multiplication 
with the variable x, i.e., (Of)(x) = xf(x) for all x € R. Show that Q defines 
a linear operator Q, in L?(R) with domain D(Q,,) = D, and OQn+i © Q, for 
N= 2355555 

3. Denote by C*(IR) the space of all functions f : R>C which have contin- 
uous derivatives up to order k. Define (Pf)(x) = it£(x) for all x € R 
and all f € C*(R) for k > 1. Next define the following subset of L7(R): 


D, = [f € L°(R)NC(R): at E LR}. Show that D;, is a dense linear sub- 


space of L?(R). Then show that P defines a linear operator P;, in L?(R) with 
domain D(P;.) = Dx and that Pri; C Py fork = 1,2,.... 

4. Show that the set (20.2) is a linear subspace of the Hilbert space. 

5. Prove: The domain D(A%*) of the adjoint of a densely defined linear operator A 
(see (20.2) is a linear subspace of H which can also be defined as 


D(A*)=%{xEH: sup |(x, Ay)| < cof. (20.6) 
yeD(A) 
Iy=1 
6. Let A be a linear operator in a Hilbert space H whose domain D(A) is not dense 
in . Characterize the nonuniqueness in the definition of an adjoint. 


20.5 Exercises 293 


7. Let H be a Hilbert space with inner product (-,-). Suppose that there is another 
inner product (-,-); on the vector space H. and there are two positive numbers 
a, 8 such that 
a(x, x) < (x,x)) < B(x, x) Vx eH. 


Consider the antilinear canonical isomorphisms J (respectively J) between H. 
and its topological dual H’, defined by J(x)(y) = (x, y) (respectively J\(x)(y) = 
(x, y)1) for all x, y € H. Prove: 

a) Both inner products define the same topology on H. 

b) Both (H, (-,-)) and (H, (-, -)1) are Hilbert spaces. 

c) Let A be a linear operator in H with domain D(A) = H and || Ax|| < C||x]| 
for all x € H, for some constant 0 < C < oo. This operator then defines an 
operator A’: H’ > H’ by b> A’é, A’E(x) = (Ax) for all 2 € H’ and all 
x € H. Use the maps J and J; to relate the adjoints A* (respectively Aj) of 
A with respect to the inner product (-, -) (respectively with respect to (-, -);) to 
the operator A’. 

d) Show the relation between A* and Aj. 

8. Prove Lemma 20.2. 
9. Prove Corollary 20.2. 


Chapter 21 
Quadratic Forms 


Quadratic forms are a powerful tool for the construction of self-adjoint operators, in 
particular in situations when the natural strategy fails (for instance for the addition 
of linear operators). For this reason we give a brief introduction to the theory of 
quadratic forms. After the basic concepts have been introduced and explained by 
some examples, we give the main results of the representation theory of quadratic 
forms including detailed proofs. The power of these representation theorems is il- 
lustrated through several important applications (Friedrich’s extension, form sum of 
operators). 


21.1 Basic Concepts. Examples 


We begin by collecting the basic concepts of the theory of quadratic forms on a 
Hilbert space. 


Definition 21.1 Let + be a complex Hilbert space with inner product (-,-). A 
quadratic form E with domain D(E) = D, where D is a linear subspace of 1. is 
a mapping E : D x D —> C which is antilinear in the first and linear in the second 
argument. A quadratic form E in H is called 


a) symmetric if, and only if, E(¢, ¥) = E(w, ¢) for all ¢, ¥ € D(E) 
b) densely defined if, and only if, its domain D(E) is dense in 
c) semibounded (from below) if, and only if, there is a 4 € R such that for all 
wy € D(E), 
E(w, w) > —Allwll? 


this number d is called a lower bound of E. 
d) positive if, and only if, E is semibounded with lower bound A = 0. 
e) continuous if, and only if, there is a constant C such that 


IE@ WS Clilgliilvil Voe.w € DE). 


Based on these definitions, one introduces several other important concepts. 


© Springer International Publishing Switzerland 2015 295 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_21 


296 21 Quadratic Forms 


Definition 21.2 


a) A semibounded quadratic form E with lower bound 4 is called closed if, and only 
if, the form domain D(£) is complete when it is equipped with the form norm 


Ile =VEWW+A+ DIwll. 


b) A quadratic form F with domain D(F) is called an extension of a quadratic form 
E with domain D(E) if, and only if, D(E) C D(F) and F(¢, w) = E(¢, w) for 
alld, w € D(E). 

c) A quadratic form is called closable if, and only if, it has a closed extension. 

d) A subset D’ C D(E) of the domain of a closed quadratic form E is called a core 
if, and only if, D’ is dense in the form domain D(E) equipped with the form norm 
Il - lle. 


These definitions are illustrated by several not atypical examples. 


1. The inner product (-, -) of acomplex Hilbert space H is a positive closed quadratic 
form with domain H. 

2. Suppose that A is a linear operator in the complex Hilbert space H. with domain 
D(A). In a natural way we can associate with A two quadratic forms with form 
domain D(A): 


E\(¢,¥) = (0,4), — Exlo,v) = (Ad, Ap). 


We now relate properties of these quadratic forms to properties of the linear op- 
erator A. The form E> is always positive and symmetric. The quadratic forms are 
densely defined if, and only if, the operator A is. If the operator A is symmetric, 
the quadratic form FE; is densely defined and symmetric. The form EF; is semi- 
bounded if, and only if, the operator A is bounded from below, i.e., if, and only 
if, there is some A € R such that (Ww, AW) > A(w, w) for all Ww € D(A). 
Since the form norm || - ||z, 1s equal to the graph norm of the operator A, the 
quadratic form E> is closed if, and only if, the operator A is closed. 
It is important to note that even for a closed operator A the quadratic form EF, is 
not necessarily closed. 
Both quadratic forms EF, and E> are continuous if A is continuous, i.e., if there 
is some constant C such that || Aw|| < C||y|| for all w € D(A). 
The proof of all these statements is left as an exercise. 

3. Suppose 82 C R” is an open nonempty set. We know that D = C§°({2) is a dense 
subspace in the complex Hilbert space H = L?(). On D a quadratic form E is 
well defined by 


BOW = | Ve-vwdx= [ D Pergtonas Vo.weD. 


This quadratic form is called the Dirichlet form on Q2. It is densely defined and 
positive, but not closed. However, the Dirichlet form has a closed extension. The 


21.1 Basic Concepts. Examples 297 


completion of the domain C5°(S2) with respect to the form norm || - || z, 
wile = / [lvO)? + Vw)? dx 
2 


is just the Sobolev space Hj (S2) which is a Hilbert space with the inner product 


(@.W)12 = (6. W)2 + Y (did, HW)2. 
i=l 
Here we use the abbreviation 0; = i and (-, -)2 is the inner product of the Hilbert 
space L?(92). (Basic facts about completions are given in Appendix 34.1). 
4. Asin the previous example we are going to define a quadratic form E' with domain 
D(E) = C5°(&2) in the Hilbert space H = L?(92). Suppose we are given real 
valued functions A; € L?, (&), j=1,...,n.On D(E) we define 


loc 
BeW= [| OG 10;6 + Ajo — id; + Aj)dx. 
j=l 


(Note that the assumption A; € BE? (Q) ensures A; € L?(2)forall@ € D(E).) 
This quadratic form is densely defined and positive, but not closed. Later we will 
come back to this example. 

5. In the Hilbert space H = L?(R) the subspace D = Co°(R) is dense. On this 
subspace we define a quadratic form E35, 


E;(f,g) = f()g(0). 


It is trivial to see that Es is a positive quadratic form. It is also trivial to see that E; 
is not closed. We show now that E; does not have any closed extension. To this end 
consider a sequence of functions f,, € D which have the following properties: (i) 
0 < f(x) < 1 for all x € R; Gi) f,(0) = 1; Citi) supp f, C [- 4, 1), for alln EN. 
It follows that, as n — oo, 


2 2 2 3 2 2: 
lfullz => 0, Wfnlle, = lol? + I fall S 1+ = 1. 


Property (i) implies that (f,,), en is a Cauchy sequence with respect to the form norm 
|| - |lz;- Suppose F is a closed extension of Es. Then we have the contradiction 


0 = F(0O,0) = lim F(fh, fr) = lim Es(fu, fr) = 1, 
noo n—->0o 
hence Es has no closed extension. 


These examples show: 


1. There are closed, closable, nonclosable, symmetric, semibounded, densely 
defined, and continuous quadratic forms. 


298 21 Quadratic Forms 


2. Even positive and symmetric quadratic forms are not necessarily closable (see 
Example 5), in contrast to the situation for symmetric operators. 

3. Positive quadratic forms are closable in special cases when they are defined in 
terms of linear operators. 

4. There are positive quadratic forms which cannot be defined in terms of a linear 
operator in the sense of Example 2 (see the Exercises). 


21.2 Representation of Quadratic Forms 


We have learned in the previous section that linear operators can be used to define 
quadratic forms (Example 2) and we have mentioned an example of a densely defined 
positive quadratic form which cannot be represented by a linear operator in the 
sense of this example. Naturally the question arises which quadratic forms can be 
represented in terms of a linear operator. The main result of this section will be 
that densely defined, semi-bounded, closed quadratic forms can be represented by 
self-adjoint operators bounded from below, and this correspondence is one-to-one. 
We begin with the simplest case. 


Theorem 21.1 Let E be a densely defined continuous quadratic form in the Hilbert 
space H. Then there is a unique continuous linear operator A: H—H such that 


E(x, y) = (x, Ay) Vx,y € D(E). 


In particular this quadratic form can be extended to the quadratic form F(x, y) = 
(x, Ay) with domain H. 


Proof Since E is supposed to be continuous the estimate |E(x, y)| < C|l|x|l |ly|l 
is available for all x, y € D(E). Since D(E) is dense in H, every x € H is the 
limit of a sequence (x,)nen C D(E). Thus, given x,y € H, there are sequences 
(Xn)neN> Wnnen C D(E) such that x = lim,_,.. x, and y = limy_, 45 y,. AS conver- 
gent sequences in the Hilbert space these sequences are bounded, by some M,, 
respectively M>. For all n,m € N we estimate the quadratic form as follows: 


|E (Xn, Yn) — Em, Ym)| = |EQn = Xms Yn) + E(&ms Yn — Ym)I 
S |EQn — Xm, Yn)| + |E Gms Yn — Ym)! 
< CllXn — Xmll lal + Cllamll lyn — Yl 
< CM)||xXn — Xml| + CMi|lyn — Ymll- 


This shows that (E(%n, Yn))nen iS a Cauchy sequence in the field C. We denote its 
limit by F(x, y). As above, one shows that this limit does not depend on the sequences 
(Xn)nen and (yn)nen but only on their limits x, respectively y. Thus, F :H x H>C 
is well defined. 

Basic rules of calculation for limits imply that F too is a quadratic form, i.e., 
antilinear in the first and linear in the second argument. Furthermore, F satisfies the 


21.2 Representation of Quadratic Forms 299 


same estimate | F(x, y)| < C||x|| ||y|| onH x H. Hence, forevery x € H the mapping 
H > yt F(x, y) € Cisacontinuous linear functional. The Riesz—Fréchet theorem 
(Theorem 16.3) implies that there is a unique x* € H such that F(x, y) = (x*, y) 
for all y € H. Since F is antilinear in the first argument as the inner product, the 
mapping 1 > x + x”* is linear, and thus defines a linear operator B : HH by 
Bx = x* forall x € H. This shows that F(x, y) = (Bx, y) for all x, y € H and thus, 
defining A = B*, we get 


F(x, y) = (x, Ay) Vx,yeEH. 


Since F is continuous, the bound |(x, Ay)| < C||x|| ||y|| 1s available for all x, y € H. 
We deduce easily that ||Ax|| < C||x|| for all x € H and hence the operator A is 
continuous. 

Considerably deeper are the following two results which represent the core of the 
representation theory for quadratic forms. 


Theorem 21.2 (First Representation Theorem) Let H. be a complex Hilbert space 
and E a densely defined, closed, and semibounded quadratic form in H.. Then there 
is a self-adjoint operator A in H which is bounded from below and which defines the 
quadratic form in the following sense: 


a) 
E(x, y) = (x, Ay) Vx € D(E), Vy € D(A) C D(E). (21.1) 
b) The domain D(A) of the operator is a core of the quadratic form E. 
c) Iffor y € D(E) there exists y* € H. such that E(x, y) = (x, y*) for all elements 
x of a core of E, then it follows: y € D(A) and y* = Ay, i.e., 
i) the operator A is uniquely determined by Eq. 21.1; 


ii) if D' is a core of the quadratic form E, then the domain of the operator A is 
characterized by 


D(A) = {y € D(E): AC, € R*, |E(x, y)| < Cyllx|| Vx € D’}. 
Proof If => Ois abound of the quadratic form E, the form norm || - || ~ of E comes 
from the inner product 
(x, y)e = E(x, y)+ A+ I(x, y) (21.2) 
on D(E). Since E is closed, the form domain D(£) equipped with the inner product 
(-,-)g is a complex Hilbert space which we call He. E(x, x) + Allx||? > 0 implies 
that 


xl < llxlle = Vx € DCE). (21.3) 


Thus, for fixed x € H, the mapping Hz > y (x,y) € C defines a continuous 
linear functional on the Hilbert space H. Apply the theorem of Riesz—Fréchet to get 


300 21 Quadratic Forms 


a unique x* € Hg, such that (x,y) = (x*, y)g¢ for all y € H. As in the proof of 
the representation theorem for continuous quadratic forms, it follows that the map 
x +> x* defines a linear operator J : H—H¢. Hence, J is characterized by the 
identity 


(x,y) = (Jx, y)e Vx EH, Vy eHg. (21.4) 


Since E is densely defined, the domain D(E£) is dense in H. Suppose Jx = 0, then 
(x, y) = 0 forall y € D(E) and therefore x = 0, i.e., the operator J is injective. Per 
construction we have 

JH CHe CH. 


This allows us to calculate, for all x, y € H, using Eq. (21.4), 


(x, Jy) = (Jx,Jy)e = (Jy, Ix)e = (y, Ix) = (JX, y), 
i.e., the operator J is symmetric. It is also bounded since 


lJ yll < lJ ylle = sup {\(x, Jy)el x € He, Ilxlle < VY 


= sup {|(x,y)| +x € He, |ltlle SV S Ilyl. 


Hence, J is a self-adjoint continuous operator with trivial null space N(J) = {0}, 
ranJ C H,», and ||J ||’ = sup {|| Jy|| : y € H, |lyl| < 1} < 1. The range of J is dense 
in H since its orthogonal complement is trivial: (ran J)+ = N(J*) = N(J) = {0}. 
Here Lemma 20.2 is used. 

Now we can define a linear operator A as a simple modification of the inverse 
of J: 


Ay=J'y-Q+Dy Vy e D(A)=ranJ. (21.5) 


By Lemma 20.2 or Theorem 20.4 the operator J~! is self-adjoint, hence A is 
self-adjoint. This operator A indeed represents the quadratic form F as claimed 
in Eq. (21.1). To see this take any x € D(E) and any y € D(A) C D(E) and 
calculate 


E(x, y) = (x, y)e — A+ I(x, y) = (4, J7'y) — A+ Wx, y) = (x, Ay). 


Since A is a bound of the quadratic form E, the operator A is bounded from below: 
For all y € D(A) the estimate (y, Ay) +A(y, y) = E(y, y)+A(y, y) = Ois available. 

Next, we show that the domain of the operator A is a core of the quadratic 
form E by showing that D(A) = ran J is dense in the Hilbert space Hz. Take any 
x €(ranJ)t C Hg. Equation (21.4) implies that 


(x,y) = (x, Jy) =0 Vy EH, 


and thus x = 0 and accordingly D(A) is dense in Hz. 


21.2 Representation of Quadratic Forms 301 


Finally, we prove part (c). Suppose D’ is a core of E and suppose that for some 
y € D(E) there is a y* € H such that 


E(x,y)=(x,y") Vx e D’. 


Since ||x|| < ||x|| z, both sides of this identity are continuous with respect to the form 
norm and thus this identity has a unique || - || ~-continuous extension to all of the form 
domain D(£). In particular, for all x € D(A) C D(E), we know E(x, y) = (x, y*). 
But for x € D(A) and y € D(E) the representation E(x, y) = (Ax, y) holds 
according to Eq. (21.1). This shows that 


(Ax, y) = E(x, y) = (x,y") Vx € D(A), 


and it follows that y € D(A*) and y* = A*y. But A is self-adjoint. The characteri- 
zation of the domain D(A) then is obvious from the above considerations. Thus we 
conclude. 


Theorem 21.3 (Second Representation Theorem) Under the same assumption as 
in the first representation theorem, let X be a bound of E and let A be the self-adjoint 
operator determined by E according to Theorem 21.2. Then it follows: 


a) A+AI = 0; 
b) D(VA+AI) = D(E) and for all x, y € D(E) the identity 


E(x, y) + A(x, y) = (WA +AIx, VA 4 ALY) (21.6) 


holds; 
c) asubset D' Cc D(E) is a core of the quadratic form E if, and only if, it is a core 


of the operator (A+ AI. 


Proof The fact that a positive self-adjoint operator B has a unique square root VB, 
which is a self-adjoint operator with domain D(VB) > D(B) and characteristic 
identity (/BY = B, will be shown in Theorem 28.1 (and for bounded operators 
B in the next chapter, Theorem 22.4). D(B) is a core of the square root VB. Here, 
we simply use these results for an interesting and important extension of the first 
representation theorem of quadratic forms. 

Thus, the positive operator A + AJ has a unique self-adjoint square root / A + AI 
ona domain D = D(./A+ AI) D D(A). As in Example 2 of the previous section, 
define a quadratic form E’ on this domain by E'(x, y) = (/A +AIx, /A+4 Aly). 
Since /A + AJ is self-adjoint we know from Example 2 that E’ is a positive, closed, 
and densely defined quadratic form. On D(A) C D(E’) we can relate this form to 
the operator A itself and thus to the original quadratic form E: 


E'(x, y) = (WA +AIx, JA +ATy) = (x, JA4AIVA + ATY) 
= (x,(A+Al)y) = E(x, y) + (x,y). 


According to the results from spectral theory the domain of A is acore of the operator 
VA+AXI. Hence, D(A) is a core of the quadratic form E’. According to part (b) of 


302 21 Quadratic Forms 


Theorem 21.2, D(A) is also a core of the quadratic form E. Hence, the quadratic 
forms E’ and E + i(-,-) agree. This proves part (b). Part (c) follows immediately 
from part (b). 


21.3. Some Applications 


Given two densely defined operators we will construct, under certain assumptions 
about these operators, self-adjoint operators using the representation theorems of 
quadratic forms, for three important cases. The results which we obtain in this way 
have many applications, in particular in quantum mechanics, but not only there. 


Theorem 21.4 Suppose that B is a densely defined closed linear operator in the 
complex Hilbert space H. Then, on the domain 


D(B*B) = {x € D(B): Bx € D(B*)}, 


the operator B* B is positive and self-adjoint. The domain D(B* B) is a core of the 
operator B. 


Proof On the domain of the operator define a quadratic form E(x, y) = (Bx, By) 
for all x, y € D(B). One proves (see Example 2 above) that this is a densely defined, 
positive, and closed quadratic form. So the first representation theorem applies: 
There is a unique self-adjoint operator A with domain D(A) C D(B) such that 
(Bx, By) = (x, Ay) for all x € D(B) and all y € D(A). This implies first that 
By € D(B*) for y € D(A) and then that B*B is an extension of A: A C B*B. 
Hence, B*B is a densely defined linear operator. Now it follows easily that B* B is 
symmetric and thus A C B*B C (B*B)* C A* = A,ie., A = B*B. The second 
part of the first representation finally proves that D(B*B) = D(A) is a core of the 
operator B. 
As we had argued earlier a symmetric operator can have, in some cases, many self- 
adjoint extensions. For positive symmetric operators one can construct a “smallest” 
self-adjoint extension, using again the representation results for quadratic forms. 


Theorem 21.5 (Friedrichs Extension) Let A be a positive (or lower bounded) 
symmetric linear operator in a complex Hilbert space H. Then A has a positive self- 
adjoint extension Af which is the smallest among all positive self-adjoint extensions 
in the sense that it has the smallest form domain. This extension Af is called the 
Friedrichs extension of A. 


Proof We give the proof for the case of a positive symmetric operator. The neces- 
sary modifications for the case of a lower bounded symmetric operator are obvious 
(compare the proofs of the representation theorems). 

On the domain of the operator define a quadratic form E(x, y) = (x, Ay) for all 
x,y € D(E) = D(A). E is a densely defined positive quadratic form and (x, y)z = 
E(x, y) + (x, y) defines an inner product on D(E). This inner product space has a 
completion Hg which is a Hilbert space and in which the space D(E) is sequentially 


21.3 Some Applications 303 


dense. The quadratic form E has an extension EF to this Hilbert space which is 
defined as E(x, y) = limy—+oo E(Xn, yn) Whenever x = limy+o0 Xn, Y = LiMp+oo Yn, 
XnsYn € D(E) for all n € N. The resulting quadratic form E; is a closed densely 
defined positive quadratic form. It is called the closure of the quadratic form E. 

The first representation theorem, applied to the quadratic form EF), gives a unique 
positive self-adjoint operator A; such that 


E\(x,y) = (x,Ary) Wx € D(E)), Vy € D(Ar) © D(E)). 


For x, y € D(A) one has (x, Ay) = E(x, y) = E,(x, y) andhence A, is an extension 
of A. 

Finally, we prove that Ar is the smallest self-adjoint positive extension of A. 
Suppose B > 0 is a self-adjoint extension of A. The associated quadratic form 
E(x, y) = (x, By) on D(B) then is an extension of the form EF’. Hence, the closure 
E, p of the quadratic form Eg is an extension of the closure F; of the quadratic 
form E. The second representation theorem implies: The form domain of Ex is 
the domain D(V/B) and the form domain of E, is the domain D(./Af), hence 
D(J/Ar) © D(\V/B) and thus we conclude. 

Note that in this proof we have used the following facts about positive self-adjoint 
operators B which are of interest on their own. Recall first that the domain D(B) is 
contained in the domain of the square root /B of B and that D(B) is a core for the 
operator /B. With B we can associate two densely defined positive quadratic forms: 
E,(x, y) = (x, By) with domain D(E,) = D(B) and E(x, y) = (/ Bx, VBy) with 
domain D(E>) = D(./B). E> is aclosed extension of E, and actually the closure of 
E, (see Exercises). Thus, D(J/B) is called the form domain of the positive self-adjoint 
operator B. 

Our last application of the representation theorems for quadratic forms is con- 
cerned with the sum of two positive self-adjoint operators. There are examples of 
such operators for which the intersection of their domains is trivial and thus the nat- 
ural way to define their sum gives an uninteresting result. In some cases quadratic 
forms and their representation can help to define the form sum of such operators. 

Suppose A and B are two positive self-adjoint operators in the Hilbert space H. 
such that D = D(/A)N D(\/B) is dense in H. Then a densely defined positive 
quadratic form F is naturally defined on D by 


E(x, y)= (J Ax, VAy) + (/Bx,/ By). 


The closure E, of this quadratic form is then a closed positive densely defined 
quadratic form to which the first representation theorem can be applied. Hence, there 
is a unique positive self-adjoint operator C with domain D(C) C D(£;) such that 
for all x € D(E;) and all y € D(C) the standard representation E\(x, y) = (x, Cy) 
holds. This self-adjoint operator C is called the form sum of A and B. One writes 


C= A+B. (21.7) 


Typically, the construction of the form sum is used in the theory of Schrédinger 
operators in those cases where the potential V has a too strong local singularity 


304 21 Quadratic Forms 


which prevents V to be locally square integrable. A simple case for this construction 
is considered below. 

Let Hp = ~ P? be the free Hamilton operator in the Hilbert space H = L?(R"). 
On Do = Co(R") the momentum operator P is given by iV. (Some details of the 
construction of the free Hamilton operator as a self-adjoint operator in L?(R”) are 
considered in the exercise using Theorem 21.4.) Suppose that the potential V is a 
nonnegative function in L loc(R”) which does not belong to L 7 (R"). Then V - dis 
not necessarily square-integrable for @ € Do so that we cannot define the interacting 
Schrédinger operator Hy + V on Do by (Hp + V)¢ = Hod + Vo. However, the 
assumptionO< VeL ial") ensures that the interacting Schrédinger operator can 
be constructed as a self-adjoint operator as the form sum of the free Schrédinger 
operator Ho and the interaction V. 

On Do a positive quadratic form Eo is well defined by 


1 n 
Eo. W) = =~ 9 (3j6,00)2 + VVO.VV 2 
j=l 
Here, as usual, we use the notation 0; = =n and (-,-)2 is the inner product of the 
Hilbert space L?(R"). This quadratic form is closable. Applying the first representa- 
tion theorem to the closure E of this quadratic form Eo defines the form sum Hy+V 
of Hp and V as a self-adjoint positive operator. Thus, we get for all 6, ¥ € Do, 


; t< 
(6, (Hot V)2 = =— (8), dY)2 + (WVb, VV Wo. (21.8) 
j=l 


21.4 Exercises 


1. Prove: A semi-bounded quadratic form is not necessarily symmetric. 

2. Let A be a linear operator in the complex Hilbert space H and associate to it the 
quadratic forms F, and £4. Prove all the statements of the second example of the 
first section about the relation between the operator A and these quadratic forms. 

3. Prove: There is no linear operator A in L?(R) with domain D(A) D C>° CR) such 
that 


(f,Ag)2 = fg) or (Af, Ag)2 = fOgO) Vf.g € CHR). 


4. On the uae Do = Co°(R) C L?(R) the momentum operator Pp is defined 
by Pod = — i*%. Show: Po is symmetric. Determine the domain D(P;) of the 
adjoint of Po and the adjoint Pj itself. Finally show that P¥ is self-adjoint. 

Hint: Use the Fourier transform on L?(R) and recall the example of the free 
Hamilton operator in the previous chapter. 
5. Using the results of the previous problem and Theorem 21.4 determine the domain 


on which the free Hamilton operator Hyp = x P? is self-adjoint. 


21.4 Exercises 305 


6. Give the details of the proof of the fact that a densely defined positive quadratic 
form Eo is closable and characterize its closure E, i.e., characterize the elements 
of the domain of the closure and the values of E at elements of its domain D(E£), 
in terms of certain limits. 

7. Find the closure of the quadratic form of Example 4 in Sect. 21.1. Which self- 
adjoint operator does this closed quadratic form represent? 


Chapter 22 
Bounded Linear Operators 


Linear operators from a Hilbert space H into a Hilbert space K are those mappings 
H— kK, which are compatible with the vector space structure on both spaces. Sim- 
ilarly, the bounded or continuous linear operators are those which are compatible 
with both the vector space and the topological structures on both spaces. The fact that 
a linear map H— XK is continuous if, and only if, it is bounded follows easily from 
Corollary 2.1. (A linear map between topological vector spaces is continuous if, and 
only if, it is continuous at the origin which in turn is equivalent to the linear map 
being bounded). This chapter studies the fundamental properties of single bounded 
linear operators and of the set of all bounded linear operators B(H) on a Hilbert space 
H. In particular, a product and various important topologies will be introduced in 
B(H)). Also examples of bounded operators which are important in quantum physics 
will be presented. 


22.1 Preliminaries 


Let H and K be two Hilbert spaces over the same field K and A a linear operator 
from H into K. A is called bounded if, and only if, the set 


{| Axl}: x € D(A), IIx llas< VY (22.1) 
is bounded. If A is bounded its norm is defined as the least upper bound of this set: 
| All = sup {| Axllc : x € D(A), IIx lla < 14 (22.2) 


We will show later that A +> ||A|| is indeed a norm on the vector space of all bounded 
linear operators from H into K. 

Linear operators which are not bounded are called unbounded. 

There are several different ways to express that a linear operator is bounded. 


Lemma 22.1 Let A be a linear operator from a Hilbert space H into a Hilbert 
space K.. The following statements are equivalent: 


a) A is bounded. 


© Springer International Publishing Switzerland 2015 307 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_22 


308 22 Bounded Linear Operators 


b) The set {\| Ax ||} : x € D(A), ||x|lq¢ = Lis bounded and the norm of A is || A\| = 
sup {||Ax|]c : x € D(A), [xllx = 1}. 
c) The set | :x € D(A), x 4 o| is bounded and the norm of A is ||A|| = 


lx ll 


sup { 4k sy € D(A), x #0. 


I llae 


d) There isa C € R* such that || Ax || < C||x||3, for all x € D(A) and the norm 
is ||A|| = inf {C eR: |Axllic} < C||x|l4, Vx € D(A). 


Proof This is a straightforward exercise. 


Corollary 22.1 J[f A is a bounded linear operator from a Hilbert space H into a 
Hilbert space K, then 


|Ax|lc <All lela == Vx € D(A). (22.3) 


Thus A has always a unique continuous extension to the closure D(A) of its domain. 
In particular, if A is densely defined, this extension is unique on all of H; if D(A) is 
not dense, then one can extend A on D(A)+ for instance by 0. Hence in all cases, a 
bounded linear operator A can be considered to have the domain D(A) = H. 


Proof For x € D(A), x # 0, estimate (22.3) is evident from Part (c) of Lemma 
22.1. For x = 0 we have Ax = 0 and thus (22.3) is satisfied. 

Concerning the extension observe that the closure of a linear subspace is again a 
linear subspace. If D(A) 3 x = limy-+o0 Xn, (Xn)nen C D(A), then estimate (22.3) 
implies immediately that (AX, )nen is a Cauchy sequence in K and thus has a unique 
limit which is called Ax, ice., 


A( lim xp) = lim Axn. 
noo n—>oo 


Finally, one shows that the limit lim Ax, does not depend on the approximating 
sequence (X;)nen, but only on its limit x. 

However, there are linear operators in an infinite dimensional Hilbert space, which 
are defined on all of the space but which are not bounded. Thus the converse of the 
above corollary does not hold. 


Proposition 22.1 In infinite dimensional Hilbert spaces H. there are linear 
operators A with domain D(A) = H which are not bounded. 


Proof Since, we will not use this result we only give a sketch of the proof. The 
axiom of choice (or Zorn’s Lemma) implies that there exists a maximal set H of 
linearly independent vectors in H, i.e., a Hamel basis. This means that every x € H 
has a unique representation as a linear combination of elements h; of the Hamel 
basis H: 


x=) ajhj aj EK, j=l,...,n EN. 


j=l 


22.2 Examples 309 


Choose a sequence (hy),en C H and define Ah, = nh, for alln € N and extend A 
by linearity to all of H: 


n 
Ax = )°ajAhj. 
j=l 

If in the linear combination an element h; occurs, which does not belong to the 
sequence chosen above define Ah; = h; or = 0. Then the domain of A is 1 and A 
is not bounded. 

In practice these everywhere defined but unbounded linear operators are not im- 
portant. Usually, one has some more information about the linear operator than 
just the fact that it is defined everywhere. And indeed, if such a linear operator is 
symmetric, then it follows that it is bounded. 


Theorem 22.1 (Hellinger-Toeplitz Theorem). Suppose A is a linear operator in the 
Hilbert space H with domain D(A) = H. If A is symmetric, i.e., if (x, Ay) = (Ax, y) 
for all x,y € H, then A is bounded. 


Proof For the indirect proof assume that A is unbounded. Then there is a sequence 
Onnen CH, || ¥|| = 1 for all n € N such that || Ay, || oo as noo. Now define 
a sequence of linear functionals 7,, : H—K by T,,(x) = (y,, Ax) = (Ay,, x) for all 
x € H. The second representation of 7, implies by Schwarz’ inequality that every 
functional T,, is continuous. For fixed x € H we can use the first representation of T,, 
to show that the sequence (T;,(x))nen is bounded: |(y,, Ax)| < |lypl| | Axl] < || Axl 
for all n € N. Thus the uniform boundedness principle (Theorem 19.6) implies that 
there is a C € R®* such that ||7,|| < C for all n ¢ N. But this gives a contradiction 
to the construction of the y,: || Ayn ||? = Tn(Ayn) < ||Tnll | Ayal] < C|| Ayn| implies 
Aynll <C. 


22.2 Examples 


In order to gain some insight into the various ways in which a linear operator in 
a Hilbert space is bounded, respectively unbounded, we study several examples in 
concrete Hilbert spaces of square integrable functions. 


1. Linear operators of differentiation such as the momentum operator are un- 
bounded in Hilbert spaces of square integrable functions. Consider for example 
the momentum operator P = i a in the Hilbert space H = L?({0, 1]). The 


functions e,(x) = e'”* obviously have the norm 1, lenlls = = Ais le |?dx = 1 


and for Pe, we find || Pen||3 = fy |—ie/,(x)|? dx = n”, hence (ele = nand the 
linear operator P is not bounded (on any domain which contains these exponential 
functions). 

2. Bounded multiplication operators. Suppose g is an essentially bounded measur- 
able function on R”. Then the operator of multiplication M, with g is a bounded 


operator in the Hilbert space L7(R”) since in this case, for almost all x € R", 


310 22 Bounded Linear Operators 


lg(x)| < |lglloo, and thus 


Me fla = ia Is@ SCOP dx < fe Igllol fOr dx = Ig loIlF 13 


for all f € L?(R"). 

3. Unbounded operators of multiplication. Consider the operator of multiplication 
with a function, which has a sufficiently strong local singularity, for instance the 
function g(x) = x-* for2a@ > 1 inthe Hilbert space L?({0, 1]). Inthe exercises we 
show that this operator is unbounded. Another way that a multiplication operator 
M, in L?(R) is not bounded is that the function g is not bounded at “infinity”. A 
very simple example is g(x) = x for all x € R on the domain 


Dp {7 €C(R) : Pon(f) = sup (1 + x°)""| f(x) < oo] 
xeR 
Consider the sequence of functions 


1 forx e[-—jJ, J], 
fj) = 40 forx g[—j—-—1,j +1], 


linear and continuous otherwise. 


Certainly, for every j € N, fj € D, (n € N fixed). A straightforward calculation 
shows 


2 peas 
fillz 2G +1) and [Mella = 37°. 


hence ||M, fj\l2 = 5 \| fj llz2. We conclude that this multiplication operator is not 
bounded. 

4, Integral operators of Hilbert-Schmidt. Let k <¢ L7(R" x R”) be given. Then 
KIS = fon Sen IkQx, y)I? dx dy is finite and thus, for almost all x € R” the 
integral foe |k(x, y)|? dy is finite and thus allows us to define a linear map K : 
L?(R")—> L?(R") by 


(Kw)\(x) = / k(x, y)w(y) dy for almost all x € R”. 
R" 


Again for almost all x € IR” this image is bounded by 


(Kya)? < if k(x, y)? dy fe Iw(y)P dy = ie k(x, y)? dy lll 


where Schwarz’ inequality is used. We deduce ||Kv¥|l2 < ||lla||vll2 for all w € 
L?(R") and the integral operator K with kernel k is bounded. Such integral 
operators are called Hilbert-Schmidt operators. They played a very important 
role in the initial stage of the theory of Hilbert spaces. We indicate briefly some 
basic aspects. A comprehensive study of these operators is presented in Chap. 26. 


22.2 Examples 311 


If {e ji FEN } is an orthonormal basis of the Hilbert space L7(IR”), every W € 
L?(R") has a Fourier expansion with respect to this basis: yy = pe (e;,W)2e;, 
Voz ez, Wal? = wis. Similarly, Ky = OP (ei, KW)2e: and ||Kyllz = 
yen v)2|*. Continuity of the operator K and of the inner product imply 


(oe) 


(e:, K)o = (ei, K() (ej, b)2es))2 = (ei, Kej)2 (ej, W)a- 


jal j=l 


Hence, the action of the integral operator K on yy € L?(IR”) can be represented as 
the action of the infinite matrix (K;;);,jen on the sequence w = (Wj) jen € (IK) 
of expansion coefficients of y, where K;; = (e;, Ke;)2 and wy = (e;,W)2. 
Because of Parseval’s relation, since e;; = e; ® e;, i, j € N, is an orthonormal 
basis of the Hilbert space L7(R” x IR”), the matrix elements are square summable, 
YP (Kil? = KIB. 

Now this matrix representation for the integral operator K allows us to rewrite 
the integral equation as infinite linear system over the space ¢7(IK) of square 
summable numerical sequences. 

Given f € L7(R"), consider for instance the integral equation 


u+ Ku= f 


for an unknown function u € L?(R"). As a linear system over ?(IK) this integral 
equation reads 


uj + >> Kijuy = fi. P= 1.2504 
j=l 


where naturally u; = (e;,u)2 and f; = (e;, f)2 for alli € N are square summable 
sequences. 

5. Spin operators. In quantum physics the spin as an internal degree of freedom 
plays a very important role. In mathematical terms it is described by a bounded 
operator, more precisely by a triple S = (S;, Sz, $3) of bounded operators. These 
operators will be discussed briefly. 

We had mentioned before that the state space of an elementary localizable particle 
with spin s = i, j =0,1,2,..., is the Hilbert space 


Hs = LR?) Q cst! = PRCA, 
The elements of H, are 2s + 1-tuples of complex valued functions f,,, m = 


—s,—s+1,...,5— 1,58, in L?(R%). The inner product of H; is 


‘fg Ba Fa@enxdx Vig € Hy. 


m=—S 


The spin operators S$; act on this space according to the following rules. 


1 i 
S,= 5 (St +S_), Sy = ape + S_), 


312 22 Bounded Linear Operators 


($3 f)m(x) = mfn(x), m = —s,—-s+1,...,5—-—1,5, 
(Sy f)n(x) = V/s ms — m +1) fin—1), 
(S_f) n(x) = Vs — mys +m + 1) fy (x). 


Clearly these operators are linear and bounded in H,. In the Exercises we show 
that they are self-adjoint: S* = S; for 7 = 1,2,3. Introducing the commutator 
notation [A, B] = AB—BA fortwo bounded linear operators one finds interesting 
commutation relations for these spin operators: 


[S1, So] = 183, [S2, 53] = iS}, [S3, 51] = 152. 


Furthermore, the operator S? = S? + S} + S} = S,S_ — S} + S; turns out to be 
proportional to the identity operator /3,, on Hs: 


(S?f)m(x) = s(s t+ Dfn(x), — ie., S? = s(s + Dy,. 


Without going into further details we mention that the operators given above are 
a realization or “representation” of the commutation relations for the S;. 

6. Wiener—Hopf operators. For a given function g € L'(R) define a map K, : 
L?(Rt+)— L?(Rt) by 


(K, f(x) = x g(x —y)f(y)dy for almost all x € Rt, V f € L2(R*). 
0 


It is not quite trivial to show that this operator is indeed a bounded linear operator. 
It is done in the Exercises. These Wiener—Hopf operators have a wide range 
of applications. They are used for instance in the analysis of boundary value 
problems, in filtering problems in information technology and metereology, and 
time series analysis in statistics. 

We conclude this section with a discussion of the famous Heisenberg commu- 
tation relations 

[Q,P]cil 


for the position operator Q and momentum operator P in quantum mechanics. 
The standard realization of these commutation relations in the Hilbert space L?(R) 
we had mentioned before: Q is realized as the multiplication operator with the 
coordinate variable x while the momentum operator then is P = — i 4, both on 
suitable domains which have been studied in detail earlier. Recall that both oper- 
ators are unbounded. It is an elementary calculation to verify these commutation 
relations for this case, for instance on the dense subspace C§°(R). 

Now, we ask the question whether there are other realizations of these commu- 
tation relations in terms of bounded operators. A clear answer is given in the 
following lemma. 


Lemma 22.2 (Lemma of Wielandt) There are no bounded linear operators Q and P 
in a Hilbert space H which satisfy the commutation relations [Q, P|} = QP -PQ= 
i I where I is the identity operator in H. 


22.3 The Space B(H, K) of Bounded Linear Operators 313 


Proof Weare going to derive a contradiction from the assumption that two bounded 
linear operators satisfy these commutation relations. 

Observe first that P"*'Q — QP”"*! = P"[PQ — QP]+ [P"Q — QP"|P = 
— iP + [P",Q]P. A proof of induction with respect to n gives [P”*!, Q] = 
—i(n+1)P” and thus 


(a+ DIP" = I[P"*", Olll <P" Ol + lop". 


In the following section one learns that || A B|| < ||A]| || B|| holds for bounded linear 
operators A, B. Thus we continue the above estimate 


(a+ DP" SP NMP ON + WOPU IP  <20P "Il Wl. 


According to the commutation relation we know ||P || > 0. The relation [P*, Q] = 
— iP implies ||P*|| > 0 and per induction, ||P”|| > 0 for all n € N, hence 
we can divide our estimate by || P”|| to getn + 1 < 2||Q|| ||P|| for alln € N,a 
contradiction. 


22.3. The Space B(H, KC) of Bounded Linear Operators 


Given two Hilbert spaces H and K over the field K, the set of all bounded linear 
operators A : HX is denoted by B(H, XK). This section studies the basic properties 
of this set. 

First of all, on this set B(H, K) the structure of a K-vector space can naturally 
be introduced by defining an addition and a scalar multiplication according to the 
following rules. For A, B € B(H,K) define a map A+ B:H—-K by 


(A+ B)x = Ax + Bx Vx EH, 


i.e., we add two bounded operators by adding, at each point x € H, the images Ax and 
Bx. It is straightforward to show that A + B, defined in this way, is again a bounded 
linear operator. The verification is left as an exercise. Similarly, one multiplies a 
bounded linear operator A ¢ B(H,K) with anumber A € K by multiplying, at every 
point x € H, the value Ax with i, 


(A: A)x =A-(Ax) VxEH. 


In future we will follow the tradition and write this scalar multiplication A - A simply 
as AA. Since the target space K is a vector space it is clear that with this addition 
and scalar multiplication the set B(H, ) becomes a vector space over the field K. 
The details are filled in as an exercise. 


Proposition 22.2 For two Hilbert spaces H and K over the field K the set B(H, K) 
of all bounded linear operators A : H—>K is a vector space over the field IK. The 
function A +> ||A|| defined by 


|All = sup {| Axllx : x € H, [lx lla = D 


314 22 Bounded Linear Operators 


is anorm on BCH, K). 


Proof The first part of the proof has been given above. In order to prove that the 
function A +> ||A|| actually is a norm on the vector space B(H, i), recall that 
for any A,B € B(H,K) and any x € H one knows ||Ax||< < |All |lx|lz, and 
| Bx llc < || Bll lx lez, and it follows that 


(A + B)xllc = ||Ax + Bx|lx < Axle + |] Bxllc < |All elle + BI Melle. 
Hence, 
|A + Bl| = sup {|(A + B)xllc: x © H, |lxlle = 1} < |All + IB 


is immediate. The rule ||AA|| = |A| ||A|| forall A € Kandall A € B(H, K) is obvious 
from the definition. Finally, if || A|| = 0 for A € BCH, kK) then ||Ax||c = 0 for all 
x € H, and hence Ax = 0 for all x € H,ie., A = 0. We conclude that || - || is a 
norm on B(H, K). 


Proposition 22.3. Let H and K be two Hilbert spaces over the field K. Every 
operator A € B(H,K) has an adjoint A* which is a bounded linear operator K>H. 
The map A +> A* has the following properties: 


a) A* =A forall A € B(H,K) 

b) (A+ B)* = A* + B* forall A, B € BH, K) 

c) (AA)* = XA* for all A € BCH, K) and alli € K 
d) ||A*|| = |All 


Proof Take any A € B(H,K). For all x € H and all y € K the estimate 


iy, Ax)cl S [Al Mallee by llc 


holds. Fix y € K. Then this estimate says that x  (y, Ax) is a continuous linear 
functional on H, hence by the Theorem of Riesz—Fréchet, there is a unique y* € H 
such that this functional is of the form x  (y*, x), Le., 


(y, Ax)ic = (y*,x)44 Vx EH. 


In this way we get a map y +> y* from K into H, which is called the adjoint A* of 
A, i.e., A*y = y*. This gives, for all x € H and all y € K the identity 


(y, Ax)ic = (A*y, x) a. 
Linearity of A* is evident from this identity. For the norm of A* one finds 
|A*l| = sup {I|A*yllae sy EK, Ilyllc = 1 
= sup {|(A*y,x)yl: x €H, lvl = 1, y © K, Ilylle = 1} 
= sup {(y, Ax)xl] 2 x €H, Ilxlla = 1, y EK, lly = 
= |All. 


22.4 The C*-Algebra B(H) 315 


Hence A* is bounded and Part (d) follows. 
The bi-adjoint A** = (A*)* is defined in the same way as a bounded linear operator 
H— K through the identity 


(y, A™ x) = (A*y, x) 94 


for all y € K and all x € H. But by definition of the adjoint A* both terms are equal 
to (y, Ax). We deduce A** = A. Parts (b) and (c) are easy calculations and are left 
as an exercise. 

In Proposition 22.2 we learned that the space of all bounded linear operators from 
a Hilbert space 1 into a Hilbert space K is a normed space. This is actually true 
under considerably weaker assumptions when the Hilbert spaces are replaced by 
normed spaces X and Y over the same field. In this case a linear map A : X>Y 
is bounded if, and only if, there is a C € R™ such that ||Ax||y < C||x||x for all 
x € X. Then the norm of A is defined as in the case of Hilbert spaces: || Al] = 
sup {|| Ax|ly : x € X, ||x|lx = 1}. Thus we arrive at the normed space £L(X, Y) of 
bounded linear operators X— Y. If the target space Y is complete, then this space 
is complete too, a very widely used result. Certainly, this applies also to the case 
B(H, K) of Hilbert spaces. 


Theorem 22.2 Let X and Y be normed spaces over the field K. If Y is complete, 
then the normed space L(X, Y) is also complete. 


Proof The proof that £(X, Y) is anormed space is the same as for the case of Hilbert 
spaces. Therefore we prove here completeness of this space. 

If (Ay)nen C L(X,Y) is a Cauchy sequence, then for every ¢ > 0 there is an 
no € N such that ||A, — Am|| < ¢ for alln,m > no. Now take any x € X and 
consider the sequence (A,X),en C Y. Since ||A,x — Amx|ly = ||(An — Am)xl|ly < 
l|An — Am|l llx||x, this sequence is a Cauchy sequence in Y and thus converges to 
a unique element y = y(x) € Y, y(x) = limy+o Anx. The rules of calculation for 
limits imply that x F y(x) is a linear function A : XY, Ax = lim, ~0ooA,x. A 
is bounded too: Since ||A,x — Amx|ly < €||x||x for all n,m > no it follows that, by 
taking the limit nox, || Ax — Amx|ly < €||x||x and thus for fixed m > no 


| Axlly < |]Ax — Am lly lla llx + Am*lly S (+ Am IDIlr lx, 


i.e., A is bounded and the proof is complete. 


Corollary 22.2 Let X be a normed space over the field K. Then the topological 
dual X' = £L(X,K) is complete. 


Proof The field K = R, C is complete so that the previous theorem applies. 


22.4 The C*-Algebra B(H) 


The case of the Banach space B(H, K-) of bounded linear operators from a Hilbert 
space H into a Hilbert space K in which K = H deserves special attention since 
there some additional important structure is available, namely one can naturally 


316 22 Bounded Linear Operators 


define a product through the composition. Following the tradition, the Banach space 
B(H, H) is denoted by B(H). For A, B the composition A o B : HH is again a 
bounded linear operator from H into itself since for all x € H we have || Ao Bx||z, = 
|| ACBx)|la. < ||All || Bxllec < ||All || Bll llxllez. This composition is used to define a 
product on B(H): 

A-B=AoB VA, B € BCH). 


The standard rules of composition of functions and the fact that the functions involved 
are linear imply that this product satisfies the following relations, for all A, B,C € 
BCH): 


(A-B)-C=A-(B-C), (A+ B)-C=A-C+B-C, A-(B+C)=A-B+A-C, 


ie., this product is associative and distributive but not commutative. One also has 
A-(AB) = X(A- B). Equipped with this product the Banach space (i) is a normed 
algebra. 

According to Proposition 22.3 every A € B(H) has an adjoint A* € B(H). 
Products in B(H) are transformed according to the following rule, which is shown 
in the Exercises: 

(A- B)* = B*. A* VA, Be BUH). 


As a matter of convenience we omit the ‘-’ for this product and write accordingly 
AB=A-B. 


Theorem 22.3 Let H be a Hilbert space. Then the space B(H) of all bounded 
linear operators A: HH is a C*-algebra, i.e., a complete normed algebra with 
involution *. For all A, B € B(H) one has 


a) ||AB|| < \|Al 1B 


b) ||A*|| = |IAll 
c) ||AA*|| = ||A* All = |All? 
d) |Full = 1 


If the dimension of H is larger than 1, then the algebra B(H) is non-Abelian. 


Proof Parts (a) and (b) have been shown above. Part (d) is trivial. 
By (a) and (b) we know ||AA*|| < ||A|] || A*|| = || All?. The estimate 


|| Ax||? = (Ax, Ax) = (x, A*Ax) < [[xl| |A* Ax] < IA*ATl [lll 


implies ||A||?_ < ||A*Al| and thus ||A|]?_ = ||A*Al]. Because of (b) we can ex- 
change A* and A and Part (c) holds. Multiplication of 2 x 2-matrices is already not 
commutative. 

Theorem 22.3 states that B(H) is a complete normed algebra with involution. In 
this statement it is the norm or uniform topology to which we refer. However, there 
are important problems when weaker topologies on B(H) are needed. Accordingly, 
we discuss briefly weaker topologies on this space. 


22.4 The C*-Algebra B(H) 317 


In order to put these topologies into perspective we recall the definition of neigh- 
borhoods for the norm topology. Neighborhoods of a point A € B(H) for the norm 
topology are all sets which contain a set of the form 


U,(A) = {B € BCH): ||B— All <r} 


for some r > 0. A basis of neighborhoods at the point A € B(H) for the strong 
topology on B(H) are the sets 


Urey jeces¥n (A) = {B € B(H) : |(B co A)yj\later, j=l, tee ,nh 


with r > 0 and any finite collection of points yj,...,y, € H. Finally a basis of 
neighborhoods at A € B(H) for the weak topology on B(H) are the sets 


Ur yt y-synxtvam (A) = {B € BHA): |(x;,(B-— Ay,;)l <r, f=1,...,n} 


for any finite collection of points x;,y; € H, j =1,...,n. 

In practice we will not be using the definitions of these topologies in terms of a 
neighborhood basis but the notions of convergence which these definitions imply. 
Therefore, we state these explicitly. 


Definition 22.1 A sequence (A,)ncen C B(H) converges to A € B(H) with respect 
to the 


a) Norm topology if, and only if, lim, ||A — An|| = 0 

b) Strong topology if, and only if, lim,_,., ||Ax — Anx||z, = 0 for every x € H 

c) Weak topology if ,and only if, lim,-..o |(y, Ax) — (y, A,x)| = 0 for every pair 
of points x, y € H 


The estimate ||Ax — A,x|la, < ||A — An|l |lx|l¢¢ shows that norm convergence al- 
ways implies strong convergence and similarly, according to the estimate |(y, Ax) — 
(y, Anx)| < lly lle, || Ax — Apx|lzz, strong convergence always implies weak conver- 
gence. The converses of these statements do not hold. The norm topology is really 
stronger than the strong topology, which in turn is stronger than the weak topology. 
The terminology is thus consistent. 

Some examples will help to explain the differences between these topologies. On 
the Hilbert space H = ¢7(C) consider the operator S,, which replaces the first n 
elements of the sequence x = (%1,... ,Xn,Xnt1,--- ) by 0, 


SnX = (0,... ,0, Xn41,Xn42,-.- ). 


The norm of S,, is easily calculated: ||S,|| = 1 for all n € N. Thus (S,)nen does not 
converge to 0 in norm. But this sequence converges to 0 in the strong topology since 
for any x € ¢7(C) we find | Sn lI5 = a |x;->0 as N—> Oo. 

Next define a bounded operator W,, : €7(C)— €7(C) by 


Wix = (0,... ,0,%1,%2,... ), 


ie., W, shifts x = (%1,X2,...) by n places to oo. Clearly ||W,x|l2 = ||x|l2 for 
allx € £2(C). Now take any y € £2(C) and calculate (y, W,x)2 = yen VjXj—ns 


318 22 Bounded Linear Operators 


hence |(y, W,x) Ps peer ly; |? I|x||5 0 as n—> oo. This implies that the sequence 
(Wn)nen converges to 0 in the weak but not in the strong topology. 

Finally we address the question whether these three topologies we have introduced 
on the C*-algebra B(H.) are compatible with the algebra operations. The answer is 
given in Proposition 22.4. 


Proposition 22.4 Let B(H) be the C*-algebra of bounded linear operators on a 
Hilbert space H. Then the following holds: 


a) Addition and scalar multiplication are continuous with respect to the norm, the 
strong and the weak topology on B(H) 

b) The product (A, B) +> AB is continuous with respect to the norm topology. 

c) The involution A +> A* is continuous with respect to the weak topology. 


Continuity with respect to a topology not mentioned in statements a) — c) is in general 
not given. 


Proof Allthree topologies we have introduced on B(H) are locally convex topologies 
on a vector space. Thus Part (a) is trivial. The estimate ||AB|| < ||A]] || B|| for all 
A, B € B(H) implies continuity of the product with respect to the norm topology. 
Suppose a sequence (Ay)nen C BCH) converges weakly to A € B(H). Then the 
sequence of adjoints (A*), <n converges to A* since for every pair x, y € H we have, 
as N00, 

(A&x, y) = (x, Any) (x, Ay) = (A%x, y). 


Explicit examples in infinite dimensional Hilbert spaces show that the involution 
At» A* is not continuous with respect to the strong and the norm topology and that 
the multiplication is not continuous with respect to the strong and the weak topology. 
These counterexamples are done as exercises. 

The fundamental role which C*-algebras play in local quantum physics is 
explained in full detail in [1]. 


22.5 Calculus in the C*-Algebra B(H) 


22.5.1 Preliminaries 


On the C*-algebra B(H) one can do calculus since we can add and multiply elements 
and one can take limits. With these operations one can calculate certain functions 
f(A) of elements A € B(H). Suppose that f is analytic in the disk |z| < R for 
some R > 0. Then f has a power series expansion )~°° ) dnz”, which converges for 
Iz| < R,i.e., f(z) = limy—oo f(z) where fyy(z) = se anz" is a partial sum. For 
any A € B(H) the polynomial 


N 
f(A) = doa, A" 


n=0 


22.5 Calculus in the C*-Algebra B(H) 319 


is certainly a well-defined element in B(H). And so is the limit in the norm topology 
of B(H) if it exists. We claim: For A € B(H), ||Al| < R, this sequence of partial 
sums has a limit in B(H). It suffices to show that this sequence is a Cauchy sequence. 
Since the power series converges, given ¢ > 0 and ||Al| <r < R there is ng € N 
such that ae la;| lz|i < ¢ for all m > n > no and all |z| < r. Therefore, 


Il fn(A) — fn ADI = WO Gi ATI S GR, Ia IAI < € for all m > n > no, and 
this sequence is indeed a Cauchy sequence and thus converges to a unique element 


F(A) € BCH), usually written as 


(oe) 


f(A)= a, A”. 


n=0 


Let us consider two well-known examples. The geometric series )°°°, z” is known 
to converge for |z| < 1 to the function (1 — z)~'. Hence, for every A € B(H), 
|All < 1, we get 7 = Jy, the identity operator on 11) 


d-Ay!= oa". (22.4) 


n=0 


The operator series )°°) A” is often called the Neumann series. It was first 
introduced in the study of integral equations to calculate the inverse of J — A. 

Another important series is the exponential series )°>- 42", which is known to 
have a radius of convergence R = oo. Hence for every A € B(H) 


1 
A n 


is a well-defined element in B(H). If A, B € B(H) commute, i.e., AB = BA, then 


one can show, as for complex numbers, e “+? = e4e%. As a special case consider 


U(t) = e'4 fort € C for some fixed A € B(H). One finds 
Utt+s)=UC)U(s) Vt,s €C, UO)=TI. 


This family of operators U(t) € B(H), t € C has interesting applications for the 
solution of differential equations in H. Take some x9 € H and consider the function 
x: CoH, 

x(t) = U(t)x9 = e'4xy9 =o t EC. 


We have x(0) = xo and fort,s € C 
x(t) — x(s) = e*4[e% A x9 — xo]. 


In the Exercises one proves, as an identity in H, 


li 


t>s 


= Ax(s), 


_ x(t) — x(s) 
ra t 


320 22 Bounded Linear Operators 


ie., the function x(t) is differentiable (actually it is analytic) and satisfies the 
differential equation 


x(t) = Ax(t), tec, x(0) = xo. 


Therefore, x(t) = e'4 


x(0) = xo. 

Such differential equations are used often for the description of the time evolution 
of physical systems. Compared to the time evolution of systems in classical mechan- 
ics the exponential bound ||x(t)|| < e!!'4!l||xo{|7, for all t € IR corresponds to the 
case of bounded vector fields governing the time evolution. 


Xo is a solution of the initial value problem x/(t) = Ax(t), 


22.5.2 Polar Decomposition of Operators 


Recall the polar representation of acomplex number z = e ‘“®*|z| where the modulus 
of zis the positive square root of the product of the complex number and its complex 
conjugate: |z| = ./Zz. In this section we will present an analog for bounded linear 
operators on a Hilbert space, called the polar decomposition. In a first step the 
square root of a positive operator is defined using the power series representation of 
the square root, a result which is of great interest on its own. Thus one can define 
the modulus | A| of a bounded linear operator A as the positive square root of A*A. 
The phase factor in the polar decomposition of complex numbers will be replaced 
in the case of operators by a partial isometry, i.e., an operator which is isometric on 
the orthogonal complement of its null space. 

It is a well-known fact (see also the Exercises) that the Taylor expansion at z = 0 
of the function z+ ./1 — z converges absolutely for |z| < 1: 


lo, @) 
VI=z=1-) ajz! V|z| <1. (22.6) 
j=l 


The coefficients a; of this expansion are all positive and known explicitly. 
Similarly to the previous two examples this power series will be used to define 
the square root of a positive linear operator. 


Theorem 22.4 (Square Root Lemma) Let A € B(H) be positive, i.e., 0 < (x, Ax) 
for all x € H. Then there is a unique positive operator B € B(H) such that B* = A. 
This operator B commutes with every bounded linear operator which commutes with 
A. One calls B the positive square root of A and writes B = VA. 

If \| Al] < 1, then VA has the norm convergent power series expansion 


VA= JI -(1- A)=1-) aj - Ay (22.7) 


j=l 


where the coefficients are those of Eq. (22.6). The general case is easily reduced to 
this one. 


22.6 Exercises 321 


Proof For a positive operator A of norm < | one has ||/ — Al] = supy,y— |(x, — 
A)x)| < 1. Hence we know that the series in Eq. (22.7) converges in norm to some 
bounded linear operator B. Since the square of the series (22.6) is known to be | — z, 
the square of the series (22.7) is J — (J — A) = A, thus Be=A. 

In order to show positivity of B observe thatO0 < J — A < IJ implies 0 < 
(x, — A)"x) < 1 for all x € H, ||x|| = 1. The series (22.7) for B implies that 


Ms 


(x, Bx) = (x,x) — yo ajtx, —A)ix)>1- 5 a; >0 
j=l 


J 


ll 
= 


(oe) 


where in the last step the estimate )~ =i 


B>0. 

The partial sums of the series (22.7 ) commute obviously with every bounded 
operator which commutes with A. Thus the norm limit B does the same. 

Suppose 0 < C € B(H) satisfies C2? = A. Then CA = CC? = AC, thus C 
commutes with A and hence with B. Calculate (B—C)B(B—C)+(B—C)C(B—-C) = 
(B? — C)(B — C) = 0 and note that the two summands are positive operators, hence 
both of them vanish and so does their difference (B — C)B(B —C)—(B—C)C(B 
C) = (B — C)> = 0. It follows that ||B — C||* = ||(B — C)*|| = 0, since B — C is 
self-adjoint. We conclude B —C = 0. 


Definition 22.2 The function | - | : B(H)— B(H) defined by |A| = V A*A for all 
A € B(H) is called the modulus. Its values are positive bounded operators. 


aj; < 1 is used (see Exercises). Therefore 


Theorem 22.5 (Polar Decomposition) For every bounded linear operator A on the 
Hilbert space H the polar decomposition 


A=UIA| (22.8) 


holds. Here |A| is the modulus of A and U is a partial isometry with null space 
N(U) = N(A). U is uniquely determined by this condition and its range is ran A. 


Proof The definition of the modulus implies for all x « H 
I |All? = (x, [APx) = (x, A* Ax) = |All, 


hence N(A) = N(|A|) = (ran|A])+, and we have the orthogonal decomposition 
H = N(\A|) @ ran |A| of the Hilbert space. Now define a map U : HH with 
N(U) = N({A|) by continuous extension of U(|A|x) = Ax for all x € H. Because 
of the identity given above, U is a well-defined linear operator which is isometric on 
ran |A|. Its range is ran A. On the basis of Eq. (22.8) and the condition N(U) = N(A) 
the proof of uniqueness is straightforward. 


22.6 Exercises 


1. Prove Lemma 22.1. 
2. Prove that the operator of multiplication with the function g(x) = x~%, 2a > 1, 
is unbounded in the Hilbert space L?({0, 1)). 


322 22 Bounded Linear Operators 


Hints: Consider the functions 
Sn (x ) = 


For these functions one can calculate the relevant norms easily. 

3. Prove all the statements about the spin operators in the section on examples of 
bounded linear operators. 

4. Prove that the Wiener—Hopf operators are well-defined bounded linear operators 

in L?(R*). 

Hints: Consider the space L7(IR*+) as a subspace of L*(R) and use the re- 
sults on the relations between multiplication and convolution under Fourier 
transformation given in Part A, Chap. 10. 

. Prove parts (b) and (c) of Proposition 22.3. 

. For A, B € B(H) prove: (A + B)* = A* + B* and (A- B)* = B*. A*. 

7. For A € B(H) and xo € H, define x(t) = e'4x9 for t € C and show that this 
function C+ H. is differentiable on C. Calculate its (complex) derivative. 

8. In the Hilbert space H = €7(C), denote by e; j € N the standard basis vectors 
(the sequence e; has a | at position j, otherwise all elements are 0). Then every 
xe €?(C) has the Fourier expansion x = ae x,;e; with (x;)jen a Square 
summable sequence of numbers. Define a bounded linear operator A € B(H) by 


nN 


and show: 
a) A* jai jj = Leja Xe 41 
b) The sequence A, = A” converges to 0 in the strong topology 
c) A* = (A*)” does not converge strongly to 0 
d) A,A* = I foralln e N 
e) deduce that the product is continuous neither with respect to the strong nor 
with respect to the weak topology. 
. Though in general the involution is not strongly continuous on 6(H.) it is strongly 
continuous on a linear subspace VV of normal operators in B(H), i.e., bounded 
operators with the property 


\o 


A*A = AA*. 


Prove: If (An)nex C N converges strongly to A € N then the sequence of 
adjoints (A*),<n converges strongly to A*. 
Hints: Show first that ||(A* — A*)x||Z, = ||(A — An)xl|5, for x € H. 

10. Show: The algebra Q = M(C) of complex 2 x 2 matrices is not a C*-algebra 
when it is equipped with the norm 


|All = VTr(AA*) = 


Reference 323 


11. 


1 i 
Hints: Take the matrix A = and calculate || A*||? and ||AA*||?. 
0 1 


Show that the Taylor series of the function f(z) = /1— zat z = Ois of the form 
[o.e) 
VI-z=1-)oajz gi < 1 
j=l 


a | 


Prove: )°“° | a; < 1. Deduce that the above power series for ./T — z converges 
for |z| < 1. 

Hints: For any N € N write e 4 aj = limy—1 ae a;x/ withO < x <1. 
Since the coefficients are positive, one has ae ajxi < D2, ajxi = 1- 


V1l—-x. 


Reference 


1. 


Haag R. Local quantum physics : fields, particles, algebras. 2nd ed. Texts and monographs in 
physics. Berlin: Springer-Verlag; 1998. 


Chapter 23 
Special Classes of Linear Operators 


23.1 Projection Operators 


Let e be a unit vector in a Hilbert space H over the field KK with inner product (-, -). 
Define P. : H>H by P.x = (e,x)e forall x € H. Evidently, P. is a bounded linear 
operator with null space (kernel) N(P.) = {e}+ and range ranP, = Ke. In addition 
P, satisfies P* = P, and P? = P, which is also elementary to prove. The operator 
P, is the simplest example of the class of projection operators or projectors to be 
studied in this section. 


Definition 23.1 A bounded linear operator P on a Hilbert space H which is sym- 
metric, P* = P, and idempotent, P? = P, is called a projector or projection 
operator. 

The set of all projection operators on a Hilbert space H is denoted by B(H), ie., 


PCH) ={P € BCH): P*=P=P’}. 


With the help of the following proposition one can easily construct many examples 
of projectors explicitly. 


Proposition 23.1 Let be a Hilbert space over the field K. Projectors on H. have 
the following properties: 


a) For every P € B(H), P #0, || P|| = 1; 
b) a bounded operator P_ € B(H) is a projector if, and only if, P+’ = I — P isa 
projector; 


c) if P € BCH), then 
H =ranP @ranPt, Plranp =Tranp, Plranpt = 03 


d) there is a one-to-one correspondence between projection operators P on‘H and 
closed linear subspaces M of H, i.e., the range ranP of a projector P is a closed 
linear subspace of H, and conversely to every closed linear subspace M C H. 
there is exactly one P € $8(H) such that the range of this projector is M; 


© Springer International Publishing Switzerland 2015 325 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_23 


326 23 Special Classes of Linear Operators 


e) Suppose that {é,:n=1,...,N}, N € N or N = of is an orthonormal 
system in H, then the projection operator onto the closed linear subspace 
M =[{e,: n=1,...,N}] generated by the orthonormal system is 


N 
Pyx = (en, x)en Vx EH. (23.1) 


n=1 


Proof By definition any projector satisfies P = P* P, thus by Theorem 22.3 || P|| = 
|| P* P|| = || Pll’, and therefore || P || € {0, 1} and Part a) follows. 

To prove b) we show that the operator J — P satisfies the defining relations of a 
projector (Q = Q* = Q?) if, and only if, P does: 1 — P = (I — P)/* =I—-P* & 
P=P*andI—-P=(I1—-PP=I1—-P-—P+P?S-P+P?=0. 

For the proof of Part c) observe that the relation ] = P+ P+ implies immediately 
that every x € H is the sum of an element in the range of P and an element in the range 
of P+. In Part d) we prove that the range of a projector is a closed linear subspace. 
Thus ranP @ ranP* gives indeed a decomposition of 1 into closed orthogonal 
subspaces. The image of Px € ranP under P is PPx = Px, since P*? = P and 
similarly, the image of P+x € ranP+ under P is PP+x = P(J — P)x = 0, thus 
the second and third statement in Part c) follow. 

Let P be a projector on H and y an element in the closure of the range of P, 
i.e., there is a sequence (X,)nen C H such that y = lim Px,. Since a projector is 
continuous we deduce Py = limy+%. PPx, = lim Px, = y, thus y € ranP and 
the range of a projector is a closed linear subspace. 

Now given a closed linear subspace M C H, we apply to each x € H the 
Projection Theorem 16.1 to get a unique decomposition of x into uy € M and 
vy € M+, x = uy + v,. The uniqueness condition allows us to conclude that the 
mapping x +> uy, is linear. Since ||x||? = ||u||? + [vx ll? = ||ux ||? the linear map 
Py : HM defined by Pyx = u, is bounded. Next apply the projection theorem 
tox, y € H to get 


(x, Puy) = (Ux + Vx, Uy) 7 (Ux, Uy) m (Pux, Puy) = (Pux,y), 


hence Py = Py, = Pi,Pu = Pe and thus Py is a projector. Per construction its 
range is the given closed subspace M. This proves Part d). 

The proof of Part e) is done explicitly for the case N = oo. Then the closed 
linear hull M of the linear subspace generated by the given orthonormal system is 
described in Corollary 17.1 as 


(oe) 


[o.e) 
M=)x eH: x=) cnen Cn EK, Ela? < 20}. 


n=1 n=1 


Givenx € H, Bessels’ inequality (Corollary 15.1) states that eo (én, x)|? < |lx|l?, 
hence Pyx = SF dene Xen € M and ||Pyx|| < ||x||. It follows that Py is a 
bounded linear operator into M. By definition Pye, = e, and thus Poe. = Pyx for 


23.1 Projection Operators 327 


all x € H and we conclude Pe = Py. Next we prove symmetry of the operator Py. 
For all x, y € H the following chain of identities holds using continuity of the inner 
product: 


(oe) (oe) 


(x, Puy) = (x, > (ens yen) = D(x,€n) Ens y) = Yn X)Ens Y) = (Pus ¥), 
n=1 


n=1 n=1 


hence P;, = Py and the operator Py is a projector. Finally from the characterization 
of M repeated above it is clear that Py maps onto M. 

This proposition allows us, for instance, to construct projection operators P; = 
Py; such that 


PBA Py St 


for any given family M),..., My of pair-wise orthogonal closed linear subspaces M ; 
of a Hilbert space H such that H = M,®@M2®---@® My. Sucha family of projection 
operators is called a resolution of the identity. Later in connection with the spectral 
theorem for self-adjoint operators we will learn about a continuous analogue. Thus, 
intuitively, projectors are the basic building blocks of self-adjoint operators. 

Recall that a bounded monotone increasing sequence of real numbers converges. 
The same is true for sequences of projectors if the appropriate notion of monotonicity 
is used. 


Definition 23.2 Let be a Hilbert space with inner product (-,-). We say that a 
bounded linear operator A on H is smaller than or equal to a bounded linear operator 
B on H, insymbols A < B if, and only if, for all x € H one has (x, Ax) < (x, Bx). 

We prepare the proof of the convergence of a monotone increasing sequence of 
projectors by 


Lemma 23.1 For two projectors P,Q on a Hilbert space H the following 
statements are equivalent: 


a) P <Q; 

b) ||Px|| < ||Qx|| for all x € H; 
c) ranP CranQ; 

d) P=PQ=QP. 


Proof Since any projector satisfies P = P*P, the inequality (x, Px) < (x, Qx) 
holds if, and only if, (Px, Px) < (Qx, Qx) holds, for all x € H, thus a) and b) are 
equivalent. 

Assume ranP C ran Q and recall that y € H is an element of the range of the 
projector Q if, and only if, Qy = y. The range of P is PH which by assumption 
is contained in ran Q, hence QPx = Px for all x € H which says QP = P; and 
conversely, if QP = P holds, then clearly ranP C ran Q. Since projectors are 
self-adjoint we know that P = OP = (QP)* = P* Q* = PQ, therefore statements 
c) and d) are equivalent. 

If d) holds, then Px = PQx and thus || Px|| = ||P Qx|| < || Qx|| for all x € H, 
and conversely if || Px|| < ||Qx|| for all x € H, then Px = OPx + Q+ Px implies 


328 23 Special Classes of Linear Operators 


|Px||? = ||OPx||? + ||O1+ Px||? and hence O+ Px = 0 for all x € H, therefore 
P = QP and b) and d) are equivalent. 


Theorem 23.1 A monotone increasing sequence (P;) jen of projectors on a Hilbert 
space H converges strongly to a projector P on H. 
The null space of the limit is N(P) = NB NCP i) and its range is ranP = 


ie.) . 
Uf= ,ran P). 


Proof Pj < Pj+41 means according to Lemma 23.1 that || Pjx|| < || Pj4ixll < |lxll 
for all x € H. Thus the monotone increasing and bounded sequence (|| P;x||) jen of 
numbers converges. Lemma 23.1 implies also that Py = P,P; = P; Py forall k < j 
and therefore 


|| Pix _ P,x|I? = (Px, Pjx) - (Pix, Prx) — (Pxx, P;x) + (Prx, Prx) 
= (x, Pjx) — (x, yx) = || Pjxll? — [| Pexll? 


for all j = k. Since the numerical sequence (|| P;x||) jen converges, we deduce that 
the sequence of vectors (Pjx) jen is a Cauchy sequence in H and thus converges to 
some vector in H. which we denote by Px, 


Px = lim Pjx. 
p= [o.@) 

Since this applies to every x € H,amap H 3 x— Px € H is well defined. Standard 
rules of calculation for limits imply that this map P is linear. The bound || P;x|| < ||| 
for all 7 € N implies that || Px|| < ||x|| holds for every x € H,i.e., P is a bounded 
linear operator on 1. 

Next we show that this operator is symmetric and idempotent. Continuity of the 
inner product implies for all x, y € H, 


(x, Py) = lim (x, Pjy) = lim (Pjx, y) = (Px, y) 
Ivaow JI7o 


and P is symmetric. 
Our starting point of the proof of the relation P = P? is the observation that 
lim joo(Pjx, Py) = (Px, Py) which follows from the estimate 


[(Px, Py) — (Pix, Pyy)| = |(Px — Pjx, Py) + (Px, Py — Piy)| 
S |(Px — Pjx, Pjy)| + |(Px, Py — Pjy)| 
S Ily|] |]Px — Pjx|] + | Px | Py — Pyyll 


and the strong convergence of the sequence (P;) jen. With this result the identity 
P = P? is immediate: For all x, y € 1 it implies that (x, Py) = lim joo(x, Pjy) = 
lim jo0(P;x, Pjy) = (Px, Py) and thus P = P?. 

If a vector x belongs to the kernel of all the projectors P;, then Px = 
lim j_,90 Pjx = 0 implies x € N(P). Conversely P; < P implies Pjx = 0 for 
all j eNif Px =0. 


23.2 Unitary Operators 329 


By monotonicity we know || P;x|| < limy.9 || Prx|| = || Px|| forall j ¢ N, hence 
by Lemma 23.1 ranP; C ranP forall j € N, and therefore the closure of union of the 
ranges of the projectors P; is contained in the range of P. Since Px = limj5o, Pjx 
it is obvious that the range of the limit P is contained in the closure of the union of 
the ranges ran P;. 


23.2 Unitary Operators 


23.2.1 Isometries 


The subject of this subsection is the linear maps between two Hilbert spaces which 
do not change the length or norm of vectors. These bounded operators are called 
isometries. 


Definition 23.3. For two Hilbert spaces H and K over the same field K any linear 
map A : H—K with the property 


|Axllc = lela VxeH 


is called an isometry (between H and XK). 
Since the norm of a Hilbert space is defined in terms of an inner product the 
following convenient characterization of isometries is easily available. 


Proposition 23.2 Given two Hilbert spaces H, K over the field IK and a bounded 
linear operator A : H—>K, the following statements hold. 


a) Ais an isometry  A*A = In; 

b) Every isometry A has an inverse operator A~! : ran AH and this inverse is 
Av! — A* ran As 

c) If A is an isometry, then AA* = Pray 4 is the projector onto the range of A. 


Proof The adjoint A* : KH of A is defined by the identity (A*y,x)a = (y, Ax) ic 
for all x € H and all y € K. Thus using the definition of an isometry we get 
(x, A*Ax)q, = (Ax, Ax) = (x,x)y for all x € H. The polarization identity 
implies that (x1,x2)9, = (x1, A*Ax2)x for all x1,x. € H and therefore A*A = Jy. 
The converse is obvious. 

Certainly, an isometry is injective and thus on its range it has an inverse AW! : 
ran AH. The characterization A* A = 1, of Part a) allows us to identify the inverse 
as A* lran A- 

For Part c) we use the orthogonal decomposition K = ran A @ (ran A) and 
determine A A* on both subspaces. For y € (ran A)* the equation 0 = (y, Ax) = 
(A*y, x) for all x € H implies A*y = O and thus AA*y = 0. For y € ran A, Part 
b) gives AA*y = AA~!y = y and we conclude. 


330 23 Special Classes of Linear Operators 


23.2.2 Unitary Operators and Groups of Unitary Operators 


According to Proposition 23.2 the range of an isometric operator A : H—K con- 
tains characteristic information about the operator. In general the range is a proper 
subspace of the target space . The case where this range is equal to the target space 
deserves special attention. These operators are discussed in this subsection. 


Definition 23.4 A surjective isometry U : H—<X is called a unitary operator. 
On the basis of Proposition 23.2 unitary operators can be characterized as follows. 


Proposition 23.3 For a bounded linear operator U : H—>K these statements are 
equivalent: 


a) U is unitary; 
b) U*U = ly and UU* = Ix; 
c) VH=K and (Ux, Uy) x = (x, y)x% forall x,y €H. 


Note that Part c) of this proposition identifies unitary operators as those surjective 
bounded linear operators which do not change the value of the inner product. Thus 
unitary operators respect the full structure of Hilbert spaces (linear, topological, 
metric and geometric structure). Accordingly unitary operators are the isomorphisms 
of Hilbert spaces. In the chapter on separable Hilbert spaces (Chap. 16) we had 
constructed an important example of such an isomorphism of Hilbert spaces: There 
we constructed a unitary map from a separable Hilbert space over the field IK onto 
the sequence space ¢7(IK). 

Note also that in the case of finite dimensional spaces every isometry is a unitary 
operator. The proof is done as an exercise. In the case of infinite dimensions there 
are many isometric operators which are not unitary. A simple example is discussed 
in the Exercises. 

For unitary operators of a Hilbert space H onto itself the composition of mappings 
is well defined. The composition of two unitary operators U, V on the Hilbert space 
H is again a unitary operator since by Part b) of Proposition 23.3 (UV)*(UV) = 
VFUUV = V*V = I, and (UV)(UV)* = UVV*U* = UU* = 1,,. Thus the 
unitary operators of a Hilbert space form a group, denoted by LU(H). 

This group LU(H.) contains many important and interesting subgroups. For quantum 
mechanics the one-parameter groups of unitary operators play a prominent rdle. 


Definition 23.5 A family of unitary operators {U(t) : t € R} C U(H) is called a 
one-parameter group of unitary operators in H if, and only if, U(O) = J, and 
U(s)U(t) = U(s + 1£) for all s,t € R. 

Naturally one can view a one-parameter group of unitary operators on H as a 
representation of the additive group R by unitary operators on . The importance of 
these groups for quantum mechanics comes from the fact that the time evolution of 
quantum systems is typically described by such a group. 

Under a weak continuity hypothesis the general form of these groups is known. 


23.2 Unitary Operators 331 


Theorem 23.2 (Stone). Let {U(t):t € R} be a one-parameter group of unitary 
operators on the complex Hilbert space H which is strongly continuous, i.e., for 
every x € H the function R > t>U(t)x € H is continuous. Then the set 


1 
D= {x EH: lim ae — x] exists] 
i> 
is a dense linear subspace of H and on D a linear operator A is well defined by 
1 
iAx = lim a ea VxeD. 
i> 


This operator A is self-adjoint (on D). It is called the infinitesimal generator of the 
group which often is expressed in the notation 


Ut=e"4, teR. 
Proof Since the group U is strongly continuous, the function R 5 t>U(t)x € H 
is continuous and bounded (by ||x ||) for every x € H. For every function f € D(R), 
the function t r f(t)U(t)x is thus a continuous function of compact support for 
which the existence of the Riemann integral and some basic estimates are shown in 
the Exercises. This allows us to define a map J : D(R) x HH by this integral: 


J(fx) = i fMU()xdt. 
R 


Since U is strongly continuous, given ¢ > 0, there is r > O such that 
sup_,<;<, ||U(t)x — x|| < ¢. Choose a nonnegative function p, € DCR) with the 
properties ifig p,(t) dt = 1 and supp p, C [— r,r] (such functions exist according to 
the chapter on test functions) and estimate 


Il JCer, x) — x|] = Ih [ower —x]drtl| < ih ler()LU (t)x — x]|| dt 
Sllerl sup |U@x —-xl| se. 


Therefore the set Do = {J(f,x): f € DCR), x € H} is dense in the Hilbert space 
H. By changing the integration variables we find the transformation law of the vectors 
J(f,x) under the group U: 


U(t)I(f,x) = J(f-1%), fa(x) = f(x —a) Vx eR. 


This transformation law and the linearity of J with respect to the first argument 
imply that the group U is differentiable on Do: The relation U(s)J(f,x)—J(f,x) = 
J(fs — f,x) gives 


li 1 4. f-s oe fi 
im —[U(s)J(f,x) — J(f,x)] = lim J( 
s>0 § s>0 - 


x)= JIC f',x) 


where we have used that (f_, — f)/s converges uniformly to — f’ and that uniform 
limits and Riemann integration commute. 


332 23 Special Classes of Linear Operators 


Define a function A : Do Do by AJ(f,x) = —iJ(— f’,x) and extend this 
definition by linearity to the linear hull D of Do to get a densely defined linear 
operator A : D—D. Certainly, the linear subspace D is also left invariant by the 
action of the group, U(t)D C D for all t € R and a straightforward calculation 
shows 

AU(t)y = U(t)Ay VteR, VyeD. 


The symmetry of the operator A follows from the fact that this operator is defined 
as the derivative of a unitary group (modulo the constant —i): For all f, g « D(R) 
and all x, y € H the following chain of equations holds: 


_ U(s)-I 
(AJ(f,x), (8, y)) = (lim TFs J(g,y)) 


= tim "yy, x), J(g, y)) 


= lim(J(f, x), (——— us a= te J(g, y)) 


-_ UC eel 
= lim(J(f, x), ——— J(g,y)) 
sa —1S 


= (J(f,x), AJ(g, y)). 


Certainly by linearity this symmetry relation extends to all of D. 

Next, using Corollary 20.3 we show that A is actually essentially self-adjoint 
on D. This is done by proving N(A* + iJ) = {0}. Suppose @ € D(A*) satisfies 
A*@ = ig. Then, for all y € D, 


d 
qo OY (U()W,-—i1A*$) = (UY, 4), 


ie., the function h(t) = (U(t)W, @) satisfies the differential equation h’(t) = h(t) 
for all t € R and it follows that h(t) = h(O)e’. Since the group U is unitary the 
function h is bounded and this is the case only if h(0) = (Ww, @) = 0. This argument 
applies to all y € D, hence ¢ € D+ = {0} (D is dense). Similarly one shows 
that A*d = —i¢ is satisfied only for 6 = 0. Hence Corollary 20.3 proves A to be 
essentially self-adjoint, thus the closure A of A is self-adjoint. 

When spectral calculus has been developed we will be able to define the expo- 
nential function e 4 of an unbounded self-adjoint operator and then we can show 
that this exponential function indeed is equal to the given unitary group. 

The continuity hypothesis in Stone’s theorem can be relaxed. It suffices to assume 
that the group is weakly continuous, i.e., that R 5 t b (x,U(t)y) is continuous 
for every choice of x,y € H. This is so since on the class L(H) the weak and 
strong topology coincide (see Exercises). In separable Hilbert spaces the continuity 
hypothesis can be relaxed even further to weak measurability, i-e., the map R 3 t 
(x, U(t)y) € K is measurable, for every x, y € H. 


23.3 Some Applications of Unitary Operators in Ergodic Theory 333 


23.2.3. Examples of Unitary Operators 


In the section on Fourier transformation for tempered distributions we learned that the 
Fourier transform F, on the Hilbert space L?(IR”) is a unitary operator. In the same 
Hilbert space we consider several other examples of unitary operators, respectively 
groups of such operators. 

For f € L?(R”) anda € R", define f,(x) = f(x — a) for all x € R” and then 
define U, : L?(R")— L?(R") by U, f = fa forall f € L?(R"). Forall f,g € L7(R") 
one has 


(U(a) f, U(a)g)2 = [ fe —a)g(x —a)dx = is fO)8(y) dy = (f.g)2 


and U(a) is an isometry. Given f € L?(R"), define g = f_q, and calculate U(a)g = 
8a = f, hence U(a) is surjective and thus a unitary operator, i.e., U(a) € U(L?(R")) 
for all a € R". In addition we find 


U0) =I2@,  UlaU(b)=U(at+b), Va,beR’, 


ie., {U(a): a € R"} is an n-parameter group of unitary operators on L?(R"). 
Naturally, U(a) has the interpretation of the operator of translation by a. 


23.3. Some Applications of Unitary Operators in Ergodic Theory 


Ergodic theory generalizes the law of large numbers to random variables which are 
identically distributed but not necessarily independent. It started from problems of 
statistical physics and nowadays it is a well developed mathematical theory with 
many impressive results [1]. Basically ergodic theory is the study of dynamical 
systems with an invariant measure, typically the long time behaviour is investigated. 
The earliest result of note is the Poincaré recurrence theorem [23.3]. The central 
results of ergodic theory state that under suitable assumptions the time average of a 
function along the trajectory of a dynamical system exists and is related to the space 
average in a specific way. Here the prominent early results are those of Birkhoff 
[23.7] and von Neumann [23.6]. In important special cases the time development of 
a dynamical systems is described by a strongly continuous one-parameter group of 
unitary operators V(t), ¢ € R, ina separable Hilbert space H. A much used measure 
for this long time behaviour is the limit T—-~ oo of the time average 


1 T 
a Vit) de. 


If y € His invariant under the group, 1.e., V(t)y = y forallt, then + . V@t)dty=y 
for all T > 0 and thus one would expect that this time average converges to 
the projector P onto the invariant subspace for the unitary group: Hin, = 


334 23 Special Classes of Linear Operators 


{ye H: Vit)y = y, V t}. Similarly, the following mean ergodic theorem states that 
the average 


for a unitary operator U converges strongly to the projection operator P onto the 
invariant subspace of U. 


23.3.1 Poincaré Recurrence Results 


Given a probability space (X, 2’, 4) and a measurable and measure preserving map 
T :X—X,ie., w(T~!(A)) = (A) for all A € &, introduce for A € & the sets 


T'(A)={yeX:y=T"(x),xEA} and T(A)={xeX: TRA] 


of those points which occur as images under n iterations of T respectively the set of 
those points in X which under n iterations are elements of A. 

There is an elementary observation about finite collections T~"(A), n = 0,...,m 
of these sets: 


If w(A) > O and ifm € N satisfies m > aD then at least two sets of this collection have an 


intersection with positive measure; w(T~"(A) T~*(A)) > OforO<n,k <m,k &n. 


If this would not be the case the (finite) additivity of the measure jz implies w( Ufo 
T~"(A)) = oy w(T~"(A)) and thus, since T is measure preserving, this equals 
yas (A) = w(A)(m + 1) > I, a contradiction. 

Since T is measure preserving we know for all k > n > 0 


wT "(AYN T*(A)) = WANT -&™(A)). 


Therefore our observation proves the basic version of the Poincaré recurrence 
theorem. 


Theorem 23.3 (Poincaré Recurrence Theorem—Basic Version). Let (X, 2’, 1) be 
a probability space andT : X —> X ameasure preserving map. Then, forany A € 
with (A) > 0 there ism € N such that uw(ANT~"(A)) > 0, ie., the set of points 
in A which after m iterations of T return to A has positive measure. 

A modern and much stronger version of this result is 


Theorem 23.4 (Poincaré Recurrence Theorem). Let (X, ’, 1) be a probability 
space and T : X->X a measure preserving map. Then, for any A € & 


u({x € A: dn ENVs, T(x) ¢ A}) = 0; (23.2) 


i.e., the set of points x € A such that T*(x) ¢ A for all but finitely many k has zero 
measure, in other words almost every point of A returns to A under T, infinitely 


23.3 Some Applications of Unitary Operators in Ergodic Theory 335 


often; and 


m (4 n () U ru) = (A), (23.3) 


n=1k=n 
ie., the set of x € A which return to A infinitely often has the same measure as A. 


Proof Given A € & introduce the set B = {x € A: in ENY;s, T(x) ¢ A}. 
Observe that for n € N and x € A one has 


(Wien T*(x) € A) > (Vien x € T*(A))  (X ¢ An) 


when we define forn = 0,1,2,... 


An = UJ TA), 


k=n 
It follows 
oe) oe) 
B=|J(A\An) = A\() An. 
n=1 n=1 


Clearly A C Ap and A, C A,, form <n, furthermore one has A, = T”~"(A,) 
and since T is measure preserving it follows w(A;,) = U(An,). Now A\An © Ao\An 
implies 0 < 4(A\An) < “(Ao\An) = (Ao) — (An) = 0, hence 4(A\An) = 0 
for n € N and therefore 


p(B) = Ul (U a\4o) < [uA - An) =0. 


n=1 n=1 


With the above representation of the set B statement (23.3) follows immediately 
from (23.2). 
Observe that these results can be extended to all finite positive measures. 


23.3.2 The Mean Ergodic Theorem of von Neumann 


Theorem 23.5 (Mean Ergodic Theorem—von Neumann) For any unitary opera- 
tor U ona separable Hilbert space H one has for all x € H 
N-1 
lim — )°U"x = Px (23.4) 


N>oo N 
n=0 


where P is the orthogonal projector onto the closed linear subspace 
Hinv ={y €H: Uy = y} 


of all invariant vectors in H. 


336 23 Special Classes of Linear Operators 


Proof Clearly, the set H;,, of all vectors in H. which are invariant under U is aclosed 
linear subspace of H and thus there is a unique orthogonal projector P : H—>Hiny. 


The linear operator 
ee 
Sn => N X U 
is bounded with norm smaller or equal to 1. For all x € Hj, and all N € N we find 
Syx = x and thus Eq 23.4 holds for these vectors and we are left with proving this 
equation for all x € H+ 


In order to determine Hi, introduce the set Hy = {Uy — y: y € H} and find its 
orthogonal complement. If x €¢ Hz, then for all y € H we have 0 = (x,Uy — y) = 
(U*x — x,y) and thus U*x — x = 0, hence Ux = x, ie., x € Hiny. This shows 
Hy Cc Hinv. 

If x € Hiny, then also x = U*x and thus for all y € H, 0 = (y,U*x —x) = 
(Uy — y,x), hence x € Hd. This shows Hiny © Ht and therefore Hin, = Ht or 
i= Ho, the closure of Ho in H. 


For x = Uy — y € Ho one has 
1 
Swx =  U"y— y). 


We conclude ||Syx|| < = ||yl| +0 for N—>oo. Since the sequence of operators 
Sy is uniformly bounded, it follows immediately that || S,x || —>0 also holds for 
points x in the closure of Ho and therefore 

lim Syx=0, xe Ht 


inv? 
N->0o 


and this completes the proof of Eq 23.4. 

Now consider again a probability space (X, 2’, jz) with a measure preserving map 
T : X—~X. Then it is easy to see that the operator U = Ur defined by Uf = foT 
is unitary on the Hilbert space H = L?(X, jz). In this case the space of all invariant 
vectors is the space of all f € L?(X,w) such that f o T = f and we arrive at the 
original version of von Neumann. 


Theorem 23.6 (von Neumann) Under the assumptions formulated above, for every 
f € L’(X, ) the sequence x ale Uz f converges in L?(X, 2) to the projection 
of f to the subspace of invariant functions. 

In this theorem convergence is in the L?(X, j2) -sense. Naturally one could ask 
under which conditions one has pointwise convergence. This question has been 
answered by Birkhoff [2] in his strong ergodic theorem. For this we need to recall 
the definition of ergodicity of a measure preserving map. 


Definition 23.6 Let T be a measure preserving transformation on a probability 
space (X, X’, 2). T is called ergodic if, and only if, for every T-invariant set A € 2 
(i.e., 7~!(A) = A) one has jx(A) = 0 or (A) = 1. 


Theorem 23.7 (Birkhoff’s Pointwise Ergodic Theorem) Under the assumptions 
formulated above, for every f € L'(X,) the sequence x y U7; f converges 


23.4 Self-Adjoint Hamilton Operators 337 


pointwise j1-almost everywhere to a T-invariant function f. If T is ergodic, then f 
is constant and for j.-almost all x € X one has 


f@= / fly) d u(y). 


Since obviously the proof of this result can not be done by Hilbert space methods 
we do not present it here and refer to the literature, for instance [4, 5]. 


23.4 Self-Adjoint Hamilton Operators 


The time evolution of a classical mechanical system is governed by the Hamilton 
function. Similarly, the Hamilton operator determines the time evolution of a quan- 
tum mechanical system and this operator provides information about the total energy 
of the system in specific states. In both cases it is important that the Hamilton oper- 
ator is self-adjoint in the Hilbert space of the quantum mechanical system. Thus we 
are faced with the mathematical task of constructing a self-adjoint Hamilton operator 
out of a given classical Hamilton function. The Hamilton function is the sum of the 
kinetic and the potential energy. For the construction of the Hamilton operator this 
typically means that we have to add two unbounded self-adjoint operators. 

In the chapter on quadratic forms we have explained a strategy which allows to 
add two unbounded positive operators even if the intersection of their domains of def- 
inition is too small for the natural addition of unbounded operators to be meaningful. 
Now we consider the case where the domain of the potential operator contains the do- 
main of the free Hamilton operator. Then obviously the addition of the two operators 
is not a problem. But the question of self-adjointness of the sum remains. The key to 
the solution of this problem is to consider the potential energy as a small perturbation 
of the free Hamilton operator, in a suitable way. Then indeed self-adjointness of the 
sum on the domain of the free Hamilton operator follows. 

A first section introduces the basic concepts and results of the theory of Kato 
perturbations (see the book of T. Kato, [6]) which is then applied to the case of 
Hamilton operators discussed above. 


23.4.1 Kato Perturbations 


As in most parts of this book related to quantum mechanics, in this section H is 
assumed to be a complex Hilbert space. The starting point is 


Definition 23.7 Suppose A, B are two densely defined linear operators in H. B is 
called a Kato perturbation of A if, and only if, D(A) C D(B) and there are real 
numbers 0 < a < 1 and b such that 


| Bx|| < al[Ax|| + b||x|| Vx € D(A). (23.5) 


338 23 Special Classes of Linear Operators 


This notion of a Kato perturbation is very effective in solving the problem of self- 
adjointness of the sum, under natural restrictions. 


Theorem 23.8 (Kato—Rellich Theorem) Suppose A is a self-adjoint and B is a 
symmetric operator in H. If B is a Kato perturbation of A, then the sum A + B is 
self-adjoint on the domain D(A). 


Proof According to Part c) of Theorem 20.4 it suffices to show that for some number 
c > 0 we have ran(A + B + icl) = H. For every x € D(A) andc € Ra simple 
calculation gives 


A + icd)x|? = Axl? + c7 lal. (23.6) 


Hence, for c # 0, the operator A + ic/ is injective and thus has an inverse on its 
range which is equal to H by Theorem 20.4 and which has values in the domain of A. 
Therefore the elements x € D(A) can be represented as x = (A+ icl)'y, y EH 
and the above identity can be rewritten as 


lly? = ACA + ict) Ty +7 (A+ ict ly? Vy EH. 


And this identity has two implications: 


1 
VACA + ely" SI, A+ ied! < 7G, ¢ #0. 
c 
Now use the assumption that B is a Kato perturbation of A. For c > 0 and x = 
(A+ icl)"! y € D(A) the following estimate results: 


ee t= 2 er b 
|B(A + icI) yl] < al|A(A + icl)~'yl] + bI(A + icl)'y| < (a+ ~)ilyll. 


We deduce ||B(A + icI)"!|| < (a+ ), Since a < | is assumed there is a cp > O 
such that (a+ *) < 1. Thus C = B(A+ icI)! is a bounded operator with ||C|| < 1 
and therefore the operator J + C is invertible with inverse given by the Neumann 
series (see Eq. (22.4)). This means in particular that the operator J + C has the 
range ran(J + C) = H. Since A is self-adjoint one knows ran(A + ico/) = H and 
therefore that the range of A+ B+ ico] = (1+ C)(A & ico/) is the whole Hilbert 
space. Thus we conclude. 

One can read the Kato—Rellich theorem as saying that self-adjointness of operators 
is a property which is stable against certain small symmetric perturbations. But 
naturally in a concrete case it might be quite difficult to establish whether or not 
a given symmetric operator is a Kato perturbation of a given self-adjoint operator. 
Thus the core of the following section is to prove that certain classes of potential 
operators are indeed Kato perturbations of the free Hamilton operator. 


23.4 Self-Adjoint Hamilton Operators 339 


23.4.2 Kato Perturbations of the Free Hamiltonian 


Though it can be stated more generally, we present the case of a three dimensional 
system explicitly. The Hamilton function of a particle of mass m > 0 in the force 
field associated with a potential V is 


i 
H(p,q) = =a + V@q) 


where q € R? is the position variable and p € R? the momentum of the particle. 

Recall the realization of the position operator Q = (Qj, Q2, Q3) and of the 
momentum operator P = (P;,P2,P3) in the Hilbert space H = L?(R3) of such a 
system. The domain of Q is 


D(Q) = {w € L°(R’): x;w € L°(R’), j = 1,2,3} 


and on this domain the component Q; is defined as the multiplication with the 
component x; of the variable x € R>. Such multiplication operators have been 
shown to be self-adjoint. Then the observable of potential energy V(Q) is defined 
on the domain 


DV) ={veL(R):V-peLlR)} by (ViQW (a) = Via) 


for almost all x € R?, Vw € D(V). We assume V to be a real valued function which 
is locally square integrable. Then, as we have discussed earlier, V is self-adjoint. 

The momentum operator P is the generator of the three parameter group of 
translations defined by the unitary operators U(a), a € R?, UiaW = wa, for 
all w € L?(R°). As in the one dimensional case discussed explicitly, this group is 
strongly continuous and thus Stone’s Theorem 23.2 applies and according to this 
theorem the domain of P is characterized by 


D(P)= {¥ € L7(R3): lim “se —wle LR), j= 1.2.3} 


where e; is the unit vector in coordinate direction j. Representing the elements y 
of L7(R?) as images under the Fourier transform F2, y = Faw), the domain D(P) 
is conveniently described as D(P) = [v = Faw) : wv E Do) where D(Q) — 
{¥ € L?(R°): qiV(q) € L7(R°), j = 1,2, 3}. Then the action of the momentum 
operator is Py = Fa( OW). 


Similarly the domain of the free Hamilton operator 


Hy = —P? 
° 2m 


is D(H) = {v = Fy(h) : g2W(g) € LR). Hp is self-adjoint on this domain. 


340 23 Special Classes of Linear Operators 


The verification that large classes of potential operators V are Kato perturbations 
of the free Hamiltonian is prepared by 


Lemma 23.2 All wy € D(Ho) C L?(R?) are bounded by 
Whoo S277 a PO 2m || HoWll, +r?" Iwlla), any > 0. (23.7) 
Proof For every y € D(H) we know (1 + q?ywq) € L?(R%) and (1+ q?)7! € 


L?(R°) and thus deduce w(qg) = (1 + g7)7!(1 + q7)W(q) € L'(R3). The Cauchy— 
Schwarz inequality implies 


wh =f dee a+enialdg 
<0 497) 'bhlld+q@v@le < 1a? Vile + Ilvllo). 


Now scale the function with r > 0, i.e., consider W(q) = rewrq). A simple 
integration shows 


IM =IWhi, Welle =? lvl, Wa? Verlle =r la? wll 
and thus implies 
lh = Iel <2 lalla +77 Ile). 


For the Fourier transformation the estimate ||w||,, < WII, is well known and 
estimate (23.7) follows. 


Theorem 23.9 Any potential of the form V = V, + V2 with real valued functions 
V, € L?(R°) and Vy € L®(R°) is a Kato perturbation of the free Hamilton operator 
and thus the Hamilton operator H = Hoy + V(Q) is self-adjoint on the domain 
D(A). 


Proof For every w € D(A) we estimate as follows: 
IVvllo < IVivlls + Vovlls < Ville loo + IV Moo IW ll: 
Now the term ||w||,, is estimated by our lemma and thus 
IVvllo < ar) lov lle + 6@) Iv Ilo 
with 
ar) = Qn? mW Vill, BG) = PPP? Vi ly + [Valloo- 


For sufficiently large r the factor a(r) is smaller than | so that Theorem 23.8 applies 
and proves self-adjointness of Hp + V(Q). 


23.5 Exercises 341 


23.5 Exercises 


lon 


. Consider the Hilbert space H = KK” and an isometric map A : K”—>K”. Prove: 


A is unitary. 


. In the Hilbert space H = (IK) with canonical basis {e, : n € N} define a linear 


operator A by A( 772, Cn€n) = ory Cn€nt is Cn € K, °°, |en|? < 00. Show: 
A is isometric but not unitary. 


. Show: The weak and strong operator topologies coincide on the space U(H.) of 


unitary operators on a Hilbert space H.. 


. Foracontinuous function x : RH on the real line with values in a Hilbert space 


H which has a compact support, prove the existence of the Riemann integral 


[soa 

R 

/ x(t)at| < / Ix(e)|l dr. 
R R 


Hints: As a continuous real valued function of compact support the function 
t +> ||x(t)|| is known to be Riemann integrable, hence 


and the estimate 


~ L 
dt = li Dil 
[ison dim, Dawah y 


where {tw 1b Spivey NV | is an equidistant partition of the support of the 
function x of length L. From the existence of this limit deduce that the sequence 
a ia 
Sy = Dena) NeN 
i= 
is a Cauchy sequence in the Hilbert space 1 and thus this sequence has a limit in 
H which is the Riemann integral of the vector valued function x: 


x L 
t)dt = li tnj)—.- 
[xo dim, 2 xa 


The estimate for the norm of the Riemann integral follows easily. 

Deduce also the standard properties of a Riemann integral, i.e., show that it is lin- 
ear in the integrand, additive in its domain of integration and that the fundamental 
theorem of calculus holds also for the vector-valued version. 


. Complete the proof of Theorem 25.4. 


Hints: For the proof of Part b) see also [3]. 


. Show that (1 + g?)~! € L?(R%) and calculate ||(1 + q?)~'],. 


a 


. Prove: Potentials of the form V(x) = 4, with some constant a are Kato 


|x|? 
perturbations of the free Hamilton operator in L?(R*) if 0 < p < 1. 
Hints: Denote by x x the characteristic function of the ball with radius R > 0 and 
define V; = xrV and V2 = (1 — xp)V. 


342 23 Special Classes of Linear Operators 


References 


. Billingsley P. Ergodic theory and information. New York: Wiley; 1965. 

. Birkhoff GD. Proof of the ergodic theorem. Proc Natl Acad Sci U S A. 1931;17(12):656—-660. 

. Kato T. Perturbation theory for linear operators. Berlin: Springer-Verlag; 1966. 

4. Mackey GW. Ergodic theory and its significance for statistical mechanics and probability theory. 
Adv Math. 1974;12:178-268. 

5. Reed M, Simon B. Functional analysis. vol. 1 of Methods of Modern Mathematical Physics. 
2nd ed. New York: Academic Press; 1980. 

6. Walters P. An introduction to ergodic theory. vol. 79 of Gaduate Texts in Mathematics. New 

York: Springer-Verlag; 1982. 


WN 


Chapter 24 
Elements of Spectral Theory 


The spectrum of a (closed) linear operator on an infinite dimensional Hilbert space 
is the appropriate generalization of the set of all eigenvalues of a linear operator 
in a finite dimensional Hilbert space. It is defined as the complement (in C) of the 
resolvent set. The resolvent set of a linear operator A with domain D ina Hilbert space 
H is defined as the set of all numbers 1 € C for which the inverse operator of A — AJ 
exists as a bounded linear operator H— D. For closed linear operators the resolvent 
set is open. On it the resolvent identity holds. The Wey] criterium characterizes those 
real numbers which belong to the spectrum o(A) of a self-adjoint operator A. The 
point spectrum consists of the set of all eigenvalues of an operator and the continuous 
spectrum is the complement of the point spectrum (in o(A)). As an illustration the 
spectrum of unitary operators is determined. In particular, the spectrum of the Fourier 
transformation F> on L?(R) is the set {1, i, —1, —i}. Through examples it is shown 
that the spectrum of an operator depends sensitively on its domain. 

The spectrum o(A) of a linear operator in an infinite dimensional Hilbert space 
His the appropriate generalization of the set of all eigenvalues of a linear operator in 
a finite dimensional Hilbert space. We intend to establish this statement in this and 
in the later chapters. 

If A is acomplex N x N matrix, i.e., a linear operator in the Hilbert space C, 
one has a fairly simple criterium for eigenvalues: 4 € C is an eigenvalue of A if, and 
only if, there isa, € C%, wy, 4 0, such that Ay, = AY, or (A — AD), = 0. This 
equation has a nontrivial solution if, and only if, the matrix A — AJ is not invertible. 
In the space of matrices, one has a convenient criterium to decide whether or not a 
matrix is invertible. On this space the determinant is well defined and convenient to 
use: Thus A — AJ is not invertible if, and only if, det(A — AJ) = 0. Therefore the 
set o(A) of eigenvalues of the N x N matrix A is given by 


o(A) = {A € C: A — AI is not invertible} (24.1) 
= {A € C: det(A —AJ) = O} = {Aq,..., Aw}, (24.2) 

since the polynomial det (A — A/) of degree N has exactly N roots in C. 
In an infinite dimensional Hilbert space one does not have a substitute for the 


determinant function which is general enough to cover all cases of interest (in special 
cases one can define such a function, and we will mention it briefly later). Thus, in 


© Springer International Publishing Switzerland 2015 343 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_24 


344 24 Elements of Spectral Theory 


infinite dimensional Hilbert space one can only use the first characterization of o(A) 
which is independent of the dimension of the space. If we proceed with this definition 
the above identity ensures consistency with the finite dimensional case. 


24.1 Basic Concepts and Results 


Suppose that H. is a complex Hilbert space and D is a linear subspace of this space. 
Introduce the set of bounded linear operators on H which map into D: 


B(H, D) = {A € BCH) : ranA C D}. 


Our basis definition now reads: 


Definition 24.1 Given a linear operator A with domain D in a complex Hilbert 
space H, the set 


p(A) = {z €C: A—dlI has an inverse operator (A — z/)~! € B(H, D)} (24.3) 
is called the resolvent set of A and its complement 
o(A) = C\p(A) (24.4) 
the spectrum of A. Finally the function 
Ry: o0(A)>BH,D), — Ra(z) = (A— zt)! (24.5) 


is the resolvent of A. 

Given a point z € C, it is in general not straightforward to decide when the 
operator A — z/ has an inverse in B(H, D). Here the auxiliary concept of a regular 
point is a good help. 


Definition 24.2 Suppose that A is a linear operator in with domain D. The set 
of regular points of A is the set 


py (A) = {z € C : 48(z) > O such that ||(A — zJ)x|| > d(z)||x|| Vx € D}. (24.6) 


The relation between regular points and points of the resolvent set is obvious and it 
is also clear that the set of regular points is open. For a closed operator the resolvent 
set is open too. 


Lemma 24.1 Suppose A is a linear operator in H with domain D. Then the 
following holds: 


a) p(A) © p,(A); 
b) pr(A) C C is open; 
c) if A is closed the resolvent set is open too. 


24.1 Basic Concepts and Results 345 


Proof fz € p(A) is given, then the resolvent R,4(z) is a bounded linear operator 
H— D such that x = R4(z)(A — z/1)x for all x € D, hence ||x|| < ||Ra(2)|| ||(A -— 
zI)x|| for all x € D. For arbitrary ¢ > 0 define 6(z) = (||Ra(z)|| + e)~!. With this 
choice of 6(z) we easily see that z € p,(A) and Part a) is proven. 

Given Zo € ¢,(A) there is a 6(zq) > 0 such that ||(A — zoZ)x|| = 46(zo)||x|| for all 
x € D. For all z € C with |z — z| < 55(Z0) we estimate 


|(A — 21)x|| = I](A — 20x — @ — Z0)x Il = (A — Zo) Il — I — 20) Il | 


1 1 
2 (A — z0)ll — 5S@o)llall = 5SGo)llall Vx € D, 


hence with the point zp the disk z € C : |z — z| < $8(zo) is contained in ¢,(A) too. 
Thus this set is open. 

Now we assume that the operator A is closed and zg is a point in the resolvent 
set of A. Then R4(Zo) is a bounded linear operator and thus r = ||R4(Zo)||7! > 0. 
For all z € C with |z — zo| < r this implies that C = (z — z)Ra(Zo) is a bounded 
operator #— D with ||C|| < 1. Hence the Neumann series for C converges and it 
defines the inverse of J — C: 


[oe 
Scr => Ct 
n=0 
For z € C observe that A—zI = A—zI —(z—20)1 = (A—2z0) LU — (z— 0) Ra(Zo)], 
and it follows that for all points z € C with |z — zo| < r the inverse of A — z/ exists 
and is given by 


CO 
(A— zl)? = (1 — Cy 'Raleo) = 9) @ — 20)" Raleo)". 

n=0 
In order to show that this inverse operator maps into the domain D, consider the 
partial sum Sy = ee (z— 20)" Ra(zo)"*! of this series. As a resolvent the operator 
Ra(Zo) maps into D, hence all the partial sums Sy map into D. For x € H we know 
that 

y =(A—2zl)!x = lim Syx 
N->oo 


in the Hilbert space H. We claim y € D. To see this calculate 


oe) 


(A — 21)Syx = [I — (@ — 2o)RalZo)I(A = 201) D> (z= 2)" Ra(Zo)" "x 
n=0 


N 
SSO) ye, 
n=0 


We deduce limy_,..(A — zl)Syx = x. Since A is closed, it follows that y € D and 
(A — z1)y = x. This proves that (A — z/)~! maps into D and thus is equal to the 
resolvent R4(z), for all |z — zo| < r. And the resolvent set is therefore open. 


346 24 Elements of Spectral Theory 


Corollary 24.1 For a closed linear operator A in a complex Hilbert space H. the 
resolvent is an analytic function Ra : p(A)—>B(H). For any point z € p(A) one 
has the power series expansion 


Raz) =). (@ — 20)" Ralzo)""! (24.7) 
n=0 


which converges in B(H) for all z € C with |z — zo| < ||Ra(zo)||7!. 
Furthermore the resolvent identity 


Ra(Z) — RaQ) = @—O)Ra@Rall) = Vz,6 € ofA) (24.8) 


holds and shows that the resolvents at different points commute. 


Proof The power series expansion has been established in the proof of Lemma 24.1. 
Since the resolvent maps into the domain of the operator A one has 


Raz) — Ra(S) = RaA — C1)Ra(S) — RaA — cD) Rao) 
= Ra(z)[(A — $1) — (A — 2D) Ral) = @ — C)Ra@Ra(S) 


which proves the resolvent identity. 

Note that a straightforward iteration of the resolvent identity also gives the power 
series expansion of the resolvent. 

Note that according to our definitions the operator A — z/ is injective for a regular 
point z € C and has thus a bounded inverse on its range. For a point z in the resolvent 
set o(A) the operator A — z/ is in addition surjective and its inverse maps the Hilbert 
space H into the domain D. Since regular points have a simple characterization one 
would like to know when a regular point belongs to the resolvent set. To this end we 
introduce the spaces H4(z) = ran(A — zl) = (A — zI)D C H. If the operator A is 
closed these subspaces are closed. For a regular point z the operator A — z/ has an 
inverse operator (A — zI)~! : H4(z)—> D which is bounded in norm by oa After 
these preparations we can easily decide when a regular point belongs to the resolvent 
set. This is the case if, and only if, H4(z) = H. In the generality in which we have 
discussed this problem thus far one cannot say much. However, for densely defined 
closed operators and then for self-adjoint operators we know how to proceed. 

Recall that a densely defined operator A has a unique adjoint A* and that the 
relation (ran(A — z/)) = N(A* — Z/) holds; therefore 


pr(A) © p(A) <> N(A* — ZI) = {0}. 


For a self-adjoint operator this criterion is easily verified. Suppose z € p,(A) and 
x € N(A* — ZI), ie., A*x = zx. Since A is self-adjoint it follows that x € D and 
Ax = zx and therefore Z(x,x) = (x, Ax) = (Ax,x) = z(x,x). We conclude that 
either x = 0 or z = z. The latter case implies (A — z/)x = 0 which contradicts the 
assumption that z is a regular point, hence x = 0. This nearly proves 


24.1 Basic Concepts and Results 347 


Theorem 24.1 For a self-adjoint operator A in a complex Hilbert space H. the 
resolvent set p(A) and the set p,(A) of regular points coincide and the spectrum 
o(A) is a nonempty closed subset of R. 


Proof As the complement of the open resolvent set, the spectrum o (A) is closed. For 
the proof of o(A) C R we use the identity o(A) = p,(A). For all points z = a + if 
one has for all x € D, 


(A = 2D lx? = (A — aeT)x + i Bx?|| = (A — a2) x)? + Bx? > LBP Ill, 


and this lower bound shows that all points z = a+ if with 6 # O are regular points. 

Here we prove that the spectrum of a bounded operator is not empty. The general 
case of an unbounded self-adjoint operator follows easily from the spectral theorem 
which is discussed in a later chapter (see Theorem 27.5). 

Suppose that the spectrum o(A) of the bounded self-adjoint operator is empty. 
Then the resolvent R, is an entire analytic function with values in B(H) (see 
Corollary 24.1). For all points z € C, |z| > 2/||All, the resolvent is bounded: 
Ra(z) = —z7'U — 4A)! = —2 Oy (4)” implies the bound 


1 1 


[oe 
|Ra@ll < Do WAI" [z= = 
2 Iz] — |All ~ IAI 


n=0 


As an analytic function, R4 is bounded on the compact set z € C : |z| < 2||A|| and 
hence Ry, is a bounded entire function. The theorem of Liouville (Corollary 9.3) 
implies that R, is constant. This contradiction implies that the spectrum is not 
empty. 

Since for a self-adjoint operator the resolvent set and the set of regular points are 
the same, a real number belongs to the spectrum if, and only if, it is not a regular 
point. Taking the definition of a regular point into account, points of the spectrum 
can be characterized in the following way. 


Theorem 24.2 (Weyl’s criterion). A real number i belongs to the spectrum of a self- 
adjoint operator A in a complex Hilbert space H if, and only if, there is a sequence 
(Xn)nen C D(A) such that ||x,\| = 1 for alln € N and 


lim ||(A — ADxal| = 0. 
no 


In the following section we study several explicit examples. These examples show 
that in infinite dimensional Hilbert spaces the spectrum does not only consist of 
eigenvalues, but contains various other parts which have no analogue in the finite 
dimensional case. The following definition gives a first division of the spectrum into 
the set of all eigenvalues and some remainder. Later, with the help of the spectral 
theorem, a finer division of the spectrum will be introduced and investigated. 


348 24 Elements of Spectral Theory 


Definition 24.3 Let A be a closed operator in a complex Hilbert space H and 0 (A) 
its spectrum. The point spectrum o,(A) of A is the set of all eigenvalues, i.e., 


0,(A) = {A € o(A): N(A— ATL) F {O}}. 


The complement o(A)\o,(A) of the point spectrum is the continuous spectrum 
o,(A). Finally, the discrete spectrum o,(A) of A is the set of all eigenvalues 2 of 
finite multiplicity which are isolated in 0 (A), ie., 


oq(A) = {A € 0,(A): dim N(A — AI) < 0x, A isolated ino(A)}. 


As in the finite dimensional case the eigenspaces to different eigenvalues of a self- 
adjoint operator are orthogonal. 


Corollary 24.2. Suppose A is a self-adjoint operator in a complex Hilbert space 
and x; € 0,(A), j = 1,2 are two eigenvalues. If 4, A 2, then the corresponding 
eigenspaces are orthogonal: N(A — 4,1) L N(A — A). 


Proof Vf Aw; = Ajj, then (Ay — AoW, Wo) = (ArWi, Wo) — (Wi, A2~2) = 
(AW, W2) — (Wi, Af2) = 0, hence (yi, 2) = 0 since Ay F Ad. 

We conclude this section with the observation that the spectrum of linear operators 
does not change under unitary transformations, more precisely: 


Proposition 24.1 /f Aj is a closed operator in the complex Hilbert space Hj, 
J = 1,2, and if there is a unitary map U : Hi—>H2 such that D(A2) = U D(A) 
and Ay = UA,U™|, then both operators have the same spectrum: o(A,) = o(A2). 


Proof See Exercises. 


24.2 The Spectrum of Special Operators 


In general, it is quite a difficult problem to determine the spectrum of a closed or 
self-adjoint operator. The best one can do typically is to give some estimate in those 
cases where more information about the operator is available. In special cases, for 
instance in cases of self-adjoint realizations of certain differential operators, one can 
determine the spectrum exactly. We consider a few examples. 


Proposition 24.2 The spectrum oa(U) of a unitary operator U on a complex Hilbert 
space H is contained in the unit circle {z € C : |z| = 1}. 


Proof If |z| < 1, then we write U — zI = U(I — zU~'). Since the operator zU~! 
has a norm smaller than 1, the Neumann series can be used to find the bounded 
inverse of U — zJ. Similarly, for |z| > 1, we write U — zl = —zU — 7U). This time 
the Neumann series for the operator 1U allows us to calculate the inverse. Thus all 
points z € C with |z| < 1 or |z| > 1 belong to the resolvent set and therefore the 
spectrum is contained in the unit circle. 


24.2 The Spectrum of Special Operators 349 


It is somewhat surprising that the spectrum of the Fourier Transformation F, on 
the Hilbert space L?(R) can be calculated. 


Proposition 24.3 The spectrum of the Fourier transformation Fz on the Hilbert 
space L?(R) is o(Fo) = (1, i, -1, -i}. 


Proof The system of Hermite functions {h, :n = 0,1,2,...} is an orthonormal 
basis of L?(R) (see Eq. (17.1)). In the Exercises we show by induction with respect 
to the order n that 


Fro(hn) = (— i)"An, n=0, 120 


holds. Thus we know a complete set of orthonormal eigenfunctions together with the 
corresponding eigenvalues. Therefore we can represent the Fourier transformation 


as 
oo 


Fr= > (- iP 

n=0 
where P,, is the projector onto the subspace generated by the eigenfunction h,,. In the 
following example we determine the spectrum of operators which are represented 
as a series of projectors onto an orthonormal basis with any arbitrary coefficients s,,. 
One finds that the spectrum of such an operator is the closure of the set of coefficients 
which in the present case is {1, 1, -—1,—i}. 


Example 24.1 


1. e, :n € Nis an orthonormal basis of the complex Hilbert space H and {s, :n € 
N} Cc Cis some sequence of complex numbers. Introduce the set 


ioe) 
D= reH: Sia len ni <0] 
n=1 


and for x € D define 


oe) 


Age Salevia en, (24.9) 


n=1 
In the Exercises we show that A is a densely defined closed linear operator with 
adjoint 


CO 
Ay= Salen Y)En VyeD. 
n=1 


We claim: 

o(A) = {s, :n € N}. 
Ifz € Candz ¢ {s, :n € N}, then 6(z) = inf{|s, — z| :n € N} > 0 and thus for 
all x € D, 


oe) 


(A = 2x1? = J bn — 21M (ens x)? = 802) J Men x)? = 8)? II. 
n=1 


n=1 


350 24 Elements of Spectral Theory 


Thus these points are regular points of the operator A. They are actually points 
of the resolvent set since one shows that the inverse of A — z/ is given by 


(A-zl)'x= x : 


n=1 


(Cn, X)€n 
Sn —Z 


and this operator maps H into D. We conclude p(A) = C — {s, :n € N} and 
this proves our claim. 
2. For a continuous function g : R’—>C define the domain 


D,={f € L°(R"): gf € LR}. 


As we have shown earlier the operator M, of multiplication with the function g 
is a densely defined closed linear operator in the Hilbert space = L7(R”) and 
its adjoint is the operator of multiplication with the complex conjugate function 
g. We claim that the spectrum of the operator M, is the closure of the range of 
the function g, 

o(M,) = rang 


where rang = {A © C:A= g(x) for some x € R"}. Since C\rang is open, 
every point z € C > rang has a positive distance from ran g, i.e., 


8(z) = inf {|g(x) —z| 2x eR"} > 0. 


Tae x F (g(x) —z)7! is a continuous function on R” ie is bounded by 
a 5° It follows that (M, — z1)—', defined by (M, — zI)~ lf = = is a bounded 


linear operator on L?(IR”). Since the integral 


g(x) 
fie eae zit al? dx 


is finite for all f € L?(IR") the operator (M eTed )~! maps L?(R") into the domain 
D, of Mg. This proves that p(M,) = C\rang and we conclude. 

In the case that the function g is real valued and not constant this is an example of 
an operator whose spectrum contains open intervals, i.e., the continuous spectrum 
is not empty in this case. In this case the operator M, has no eigenvalues (see 
Exercises). 


24.3 Comments on Spectral Properties of Linear Operators 


In Definition 24.3 the complement o,(A) = o(A)\o,(A) of the point spectrum 
has been called the continuous spectrum of A. This terminology is quite unfortu- 


nate since it is often rather misleading: The continuous spectrum can be a discrete 
1 


set. To see this, consider Example 24.2.1 and choose there the sequence s, = |. 


24.3 Comments on Spectral Properties of Linear Operators 351 


Then the spectrum of the operator A defined through this sequence is o(A) = 


{+ ine N} = {1, s i. oe 0} while the point spectrum is o,(A) = fi; s i, cae \, 
hence the continuous spectrum is just one point: o,(A) = 0(A)\op(A) = {0}. 

It is very important to be aware of the fact that the spectrum of an operator depends 
on its domain in a very sensitive way. To illustrate this point we are going to construct 
two unbounded linear operators which consist of the same rule of assignment but on 
different domains. The resulting operators have completely different spectra. 


In the Hilbert space H = L?({0, 1]) introduce two dense linear subspaces 


D, ={f € L’((0, 1): f is absolutely continuous, f’ € L7([0, 1])}, 
Dz ={f € D,: f() = 0}. 
Denote by P; the operator of differentiation 1 a“ on the domain D;, j = 1,2. Both 
operators P,, P, are closed. 
For every X € C the exponential function e,, e,(x) = e7'**, belongs to the 
domain D, and clearly (P, — AJ )e, = 0. We conclude that o(P;) = C. 
Elementary calculations show that the operator R, defined by 


(Rifle) = i i eM Fy) dy WF € L2((0, 1) 


has the following properties: R, maps L7({0, 1]) into D2 and 
Ql — P))R, =F, R, (Al — Px) = I\p,. 


Clearly R, is a bounded operator on L7({0, 1]), hence R, € B(L?({0, 1]), D2). This 
is true for every 4 € C and we conclude that p(P)) = C, hence o(P2) = @. 

Without proof we mention an interesting result about the spectrum of a closed 
symmetric operator. The spectrum determines whether such an operator is self- 
adjoint or not! 


Theorem 24.3 A closed symmetric operator A in a complex Hilbert space H. is 
self-adjoint if, and only if, its spectrum a (A) is a subset of R. 

This result is certainly another strong motivation why in quantum mechanics ob- 
servables should be represented by self-adjoint operators and not only by symmetric 
operators, since in quantum mechanics the expectation values of observables have 
to be real. 

One can also show that the spectrum of an essentially self-adjoint operator is 
contained in R. The converse of these results reads: If the spectrum o (A) of a sym- 
metric operator A in a complex Hilbert space H is contained in R, then the operator 
is either self-adjoint or essentially self-adjoint. This implies that the spectrum o (A) 
of a symmetric but neither self-adjoint nor essentially self-adjoint operator contains 
complex points. 


352 24 Elements of Spectral Theory 


But clearly there are nonsymmetric operators with purely real spectrum. For 
instance, in the complex Hilbert space H = C? take the real matrix 


a=(s ) a,beR,c #0. 


Obviously, o(A) = {a, b}. 

When we stated and proved earlier that the spectrum of a bounded linear operator 
in a complex Hilbert space is not empty, it was essential that we considered a Hilbert 
space over the field of complex numbers. A simple example of a bounded operator 
in a real Hilbert space with empty spectrum is 


2 0 ae 
H = R’, A= ' a,beéR, abF#0. 
(ee) 
The proof is obvious. 

Finally, we comment on the possibility to define a substitute for the determinant 
for linear operators in an infinite dimensional space. If the self-adjoint bounded linear 
operator A in a complex Hilbert space 1 has suitable spectral properties, then indeed 
a kind of determinant function det A can be defined. Suppose A can be written as 
A=I1-+ R witha self-adjoint trace class operator R. Then one defines 


det A = eT les4 


The book [1] contains a fairly detailed discussion of this problem. 


24.4 Exercises 


1. Prove Proposition 24.1. 
2. Consider the Hermite functions h,, n = 0,1,2.... Use the recursion relation 
(17.2) for the Hermite polynomials to deduce the recursion relation 


Angie) = (2n + 2)! [xha(x) — A(x) 


for the Hermite functions. Then prove by induction: Fh, =(— i)"hy. 

3. Prove that the operator defined by Eq. (24.9) is densely defined and closed and 
determine its adjoint. 

4. Show: The self-adjoint operator of multiplication with a real-valued continuous 
function which is not constant has no eigenvalues. 

5. Prove the details in the examples of Sect. 24.3 on spectral properties of linear 
operators. 


Reference 353 


Reference 


1. Reed M, Simon B. Analysis of operators. In: Methods of modern mathematical physics. Vol. 4. 
San Diego: Academic; 1978. 


Chapter 25 
Compact Operators 


25.1 Basic Theory 


In the introduction to the theory of Hilbert spaces, we mentioned that a substantial 
part of this theory has its origin in D. Hilbert’s research on the problem to extend the 
well-known theory of eigenvalues of matrices to the case of “infinite dimensional 
matrices” or linear operators in an infinite dimensional space. A certain limit (to be 
specified later) of finite dimensional matrices gives a class of operators, which are 
called compact. Accordingly, the early results in the theory of bounded operators on 
infinite dimensional Hilbert spaces were mainly concerned with this class of compact 
operators, which were typically investigated in separable spaces. We take a slightly 
more general approach. 


Definition 25.1 Let # and K be two Hilbert spaces over the field K. A bounded 
linear operator K : H—XK is called compact (or completely continuous), if and 
only if, it maps every bounded set in H{ onto a precompact set of K (this means that 
the closure of the image of a bounded set is compact), i.e., if and only if, for every 
bounded sequence (é,)ncen C H the sequence of images (Ke,)nen C K contains a 
convergent subsequence. 

In particular, in concrete problems, the following characterization of compact 
operators is very helpful. 


Theorem 25.1 (Characterization of Compact Operators) Let H and K be two 
Hilbert spaces over the field K and A : H—K a bounded linear operator. A is 
compact if it satisfies one (and thus all) of the following equivalent conditions. 


a) The image of the open unit ball B\(O) C H under A is precompact in K. 

b) The image of every bounded set B C H under A is precompact in K. 

c) For every bounded sequence (Xn)nen C H the sequence of images (AXn)nen in K 
contains a convergent subsequence. 

d) The operator A maps weakly convergent sequences in ‘H into norm convergent 
sequences in K. 


© Springer International Publishing Switzerland 2015 355 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_25 


356 25 Compact Operators 


Proof The proof proceeds according the following steps: a) > b) > c) > d)=> a). 

Assume a), that is assume A(B,(0)) is precompact in K and consider any bounded 
set B C H. It follows that B is contained in some ball B,(0) = r B, (0) for suitable 
r > 0, hence A(B) C A(rB,(O)) = rA(B,(0)), thus A(B) is precompact and b) 
follows. 

Next assume b) and recall the Bolzano—Weierstrass result that a metric space is 
compact, if and only if, every bounded sequence contains a convergent subsequence. 
Hence statement c) holds. 

Now assume c) and consider a sequence (x;),en C H, which converges weakly 
tox € H. For any z € K we find (z, Axn) cc = (A*Z, Xn)u— (A * Zx)H = (Zz, AZ KC 
as N— OO, Le., the sequence of images converges weakly to the image of the weak 
limit. 

Suppose that the sequence (y, = AXn)nen does not converge in norm to y = Ax. 
Then there are ¢ > 0 and a subsequence (Ax,,j))jen such that |ly — yypllx = € 
forall j € N. Since (x, ;)) jen is a bounded sequence there is a subsequence (*;(j,) ieN 
for which (yn¢j;) = AXn(j;))ieN ConVerges in norm, because c) is assumed. The limit of 
this sequence is y = Ax since the weak limit of this sequence is y, but this is a contra- 
diction to the construction of the subsequence (y,j;)) jen and thus lim,_,9, AX, = Ax 
in norm. This proves part d). 

Finally assume d). Take any sequence (x;,)nen C Bi (0). By Theorem 19.3 there is a 
weakly convergent subsequence (x, ;)) jen. According to assumption d) the sequence 
(Axncj)) jen C ACB, (0)) of images converges in norm. The Theorem of Bolzano— 
Weierstrass implies that A(B,(0)) is precompact. Thus, we conclude. 


Definition 25.2 A bounded linear operator A : H—><X with finite dimensional range 
is called an operator of finite rank. 

The general form of an operator A of finite rank is easily determined. The result 
is (see Exercise) 


N 
Tae fi X) He; Vx EH, 


where {e;,... , ev} is some finite orthonormal system in K and f),... , fy are some 
vectors in H. If now a sequence (x,),cen C H converges weakly to x € H then, for 
J=1,...,N, (fj, %n)u— (fj,x)y and thus, as n— oo, 


N N 
Ax = > °{ fj Xn)Hejr> 2! Si, X)Hej = Ax. 
j=l j=l 


We conclude that operators of finite rank are compact. 
The announced approximation of compact operators by matrices takes the 
following precise form. 


Theorem 25.2 In a separable Hilbert space H every compact operator A is the 
norm limit of a sequence of operators of finite rank. 


25.1 Basic Theory 357 


Proof Let {e; : j € N} be an orthonormal basis of H and introduce the projectors P, 
onto the subspace [e),... , é, ] spanned by the first n basis vectors. Proposition 23.1 
implies that the sequence of projectors P,, converges strongly to the identity 7. Define 


dy = sup {|AP,-x|| : [lx] = 1} = sup {|Ayl : y € Sn}, 


where S, = {y €[ei,...,en]-: llyll = 1}. Clearly (d,)nen is a monotone decreas- 
ing sequence of positive numbers. Thus, this sequence has a limit d > 0. For every 
n € N there is y, € S, such that ||Ay,|| > dn yn € S, means: |ly,|| = 1 and 
a ape hence (x, yn) = (Px, Yn) —>0 as n— 00 since P+x>0 in H, for every 
x €H, ie., the sequence (y,),cn converges weakly to 0. Compactness of A implies 
that || Ay, ||—>0 and thus d = 0. Finally observe 


dy = |A — AP, || Vn eN, 


hence the compact operator A is the norm limit of the sequence of operators A P,,, 


AP,x = Y (e;,x) Ae; Vx EH, 


j=l 


which are of finite rank. 
The set of compact operators is stable under uniform limits: 


Theorem 25.3 Suppose H and K are Hilbert spaces over the field Kand A, HK, 
n € N, are compact operators, and suppose that A is the norm limit of this sequence. 
Then A is compact. 


Proof Take any sequence (x;)j<en in H which converges weakly to x € H. Sucha 
sequence is strongly bounded: There is 0 < C < oo such that |x; la, yllxlla, < C 
for all 7 € N. Since A is the norm limit of the compact operators A,, given e > 0 
there is no € N such that for all n > no we know ||A — A, || < €/4C. Fix n > ng 


and estimate for 7 ¢ N 


|| Ax — Axj|],, < [(A— Ande — xp] ¢ + Anx — Anxy llc 


IA 


|A — Anll |x — x; + | Anx = Anxj|lc 


are + [Aux — An; |e = 0/2 + nx — Anxil|e- 
Since A, is compact, there is jo € N such that for all 7 > jo we know 
| Anx — AnXx; le < ¢/2 and therefore for these j > jo one has | Ax — Ax; ls <8 
thus A maps weakly convergent sequences to norm convergent sequences and we 
conclude by Theorem 25.1. 

Some important properties of compact operators are collected in 


IA 


Theorem 25.4 For a Hilbert space H denote the set of all compact operators 
A: HCH by B.(H). Then 


a) B.(H) is a linear subspace of BCH) 
b) A € B(H) is compact, if and only if, its adjoint A* is compact 


358 25 Compact Operators 


c) AE B(H) AE KAZAN, |S dim N(A — AI) < 8, ie., the eigenspaces of 
compact operators for eigenvalues different from zero are finite dimensional 

d) B.(H) is aclosed subalgebra of the C*-algebra B(H) and thus itself a C*-algebra 

e) BH) is a closed ideal in B(H), i.e., 


BCH)-B(H) C BCH) and BCH) - BCH) C BCH). 


Proof With the exception of part b) the proofs are relatively simple. Here we only 
prove part c), the other parts are done as an exercise. 

Suppose A # 0 is an eigenvalue of A. Then we have I|y(4-ar) = + AlN(A-al)- 
Thus, the identity operator on the subspace N(A—A J) is compact, hence this subspace 
must have a finite dimension. 

As a conclusion of this section a simple example of a compact operator in an 
infinite dimensional Hilbert space is discussed. 

Suppose that {e j: JEN } is an orthonormal basis of the Hilbert space . Define 
a linear operator by continuous linear extension of Ae; = Fg iJ €N,ie., define 


[oe] 


CO 
Pe 
J 2 2 
Ava) Sei. eS So dies, laj|? = |x|’. 
J j=l j=l 


It follows that the open unit ball B;(0) C H is mapped by A onto a subset, which 
is isomorphic to a subset of the Hilbert cube W (under the standard isomorphism 
between a separable Hilbert space over K and (IK), 


1 
W= {nner € (KR): lyn] <—, Wne Nf. 
n 


The following lemma shows compactness of the Hilbert cube, hence A is a compact 
operator. 


Lemma 25.1 The Hilbert cube W is a compact subset of the Hilbert space €7(R). 


Proof In an infinite dimensional Hilbert space the closed bounded sets are not 
compact (in the norm topology) but always weakly compact (see Theorem 19.7). 
Compactness of W follows from the observation that the strong and the weak topol- 
ogy coincide on W. For this it suffices to show that every weakly convergent sequence 
(Xn)nen C W with weak limit x € W also converges strongly to x. Given e > 0 there 
is p € N such that pie wd 7 < e€. Because of weak convergence of the sequence 
we can now find N € N such that aan tee xj? < ¢ for alln > N (we use the 
notation x, = (Xn,j) jen). This gives i IXn,j — |? < 3e for alln > N and thus 
x is the strong limit of the sequence (x, )nen C W. 


25.2 Spectral Theory 359 


25.2 Spectral Theory 


Compact operators are defined as linear operators with very strong continuity require- 
ments. They are those continuous operators which map weakly convergent sequences 
into strongly convergent ones (see Theorem 25.1). As a consequence their generic 
form is relatively simple and their spectrum consists only of eigenvalues. These re- 
sults and some applications are discussed in this chapter. Compact operators were 
studied intensively in the early period of Hilbert space theory (1904-1940). 


25.2.1 The Results of Riesz and Schauder 


The key to the spectral theory of self-adjoint compact operators A is a lemma which 
states that either ||A|| or — ||A|| is an eigenvalue of A. This lemma actually solves 
the extremal problem: Find the maximum of the function x ++ (x, Ax) on the set 
S; = {x €H: ||x|| = 1}. In the last part of this book a general theory for such 
extremal problems under constraints (and many other similar problems) will be 
presented. Here, however, we present a direct proof which is independent of these 
results. 


Lemma 25.2 Suppose that A is a compact self-adjoint operator in a complex 
Hilbert space H. Then at least one of the two numbers +|A| is an eigenvalue of 
A. 


Proof By definition the norm of the operator can be calculated as 
|All = sup {(x, Ax)| = [l]] = 1. 


Thus, there is a sequence (x,,),en in S; such that |A| = limy_, 5 |(%,, Ax,)|. We can 
assume that lim,_,o9(X,, AX,) exists, otherwise we would take a subsequence. Call 
this limit a. Then we know |a| = ||A||. Since A is self-adjoint this limit is real. Since 
the closed unit ball of H is weakly compact (Theorem 19.7) there is a subsequence 
(Xn(j)) jen, Which converges weakly and for which the sequence of images (AX, ;)) jen 
converges strongly to x, respectively to y. The estimate 


0 < Axi) — axag|)” = [Axa — 2a beng)» AXng) + a? S 2a? — 2a(%yG)rAXnGy) 


shows that the sequence (Ax,(;) — @Xn(j))jen Converges strongly to 0. Since we 
know strong convergence of the sequence (AX,j;));en we deduce that the sequence 
(ax; (;)) jen Converges not only weakly but strongly to ax, hence ||x|| = 1. Continuity 
of A implies lim j_,.9 AX,(;) = Ax and thus Ax = ax. Hence, a is an eigenvalue of 
A. 


Repeated application of this lemma determines the spectrum of a compact self- 
adjoint operator. 


Theorem 25.5 (Riesz-Schauder Theorem) Suppose A is a self-adjoint compact 
operator on a complex Hilbert space. Then 


360 25 Compact Operators 


a) A has a sequence of real eigenvalues 4.; 4 0 which can be enumerated in such a 
way that |A\| = |A2| = |A3| = ---. 

b) If there are infinitely many eigenvalues, then limj.,4; = 0, and the only 
accumulation point of the set of eigenvalues is the point 0. 

c) The multiplicity of every eigenvalue 4; 0 is finite. 

d) Ife; is the eigenvector for the eigenvalue i ;, then every vector in the range of A 
has the representation 


CO 
Ax = So aAjlej.x)e; 
j=l 
e€) o(A) = {Aq,A2,... , 0} but 0 is not necessarily an eigenvalue of A. 


Proof Lemma 25.2 gives the existence of an eigenvalue 2; with |A,| = ||A|| and 
a normalized eigenvector e,. Introduce the orthogonal complement H; = {e;}+ of 
this eigenvector. The operator A maps the space H into itself: For x € H, we find 
(e;, Ax) = (Ae;,x) = Aj (e,,x) = 0, hence Ax € H,. The restriction of the inner 
product of H to H; makes this space a Hilbert space and the restriction A; = A), 
of A to this Hilbert space is again a self-adjoint compact operator. Clearly, its norm 
is bounded by that of A: ||A,|| < ||A|l. 

Now, apply Lemma 25.2 to the operator A; on the Hilbert space 1 to get an 
eigenvalue A» and a normalized eigenvector e. € Hy, such that |A2| = ||Aj|| < 
|All = lAil- 

Next, introduce the subspace Hz = {e1, er}. Again, the operator A leaves this 
subspace invariant and thus the restriction Ay = Ajz, is a self-adjoint compact 
operator in the Hilbert space H2. 

Since we assume that the Hilbert space 1 is infinite dimensional, this argument 
can be iterated infinitely often and thus leads to a sequence of eigenvectors e; and 
of eigenvalues A; with |Aj41| < |A;|. If there is anr > O such that r < |A;|, then 
the sequence of vectors y; = e;/Aj; is bounded, and hence there is a weakly con- 
vergent subsequence yj). Compactness of A implies convergence of the sequence 
of images Ay jx) = jk), a contradiction since for an orthonormal system one has 
lleia — €j(m) | = J/2 fork 4 m. This proves parts a) and b). 

To prove c) observe that on the eigenspace LE; = N(A —A,J) the identity operator 
I\z, 18 equal to the compact operator TA ge, and thus this space has to be finite 
dimensional. 

The projector onto the subspace [e),... , e, ] spanned by the first n eigenvectors 
is P,x = jai hej. x)e;. Then J — P,, is the projector onto [e,,... ,én]+ = Hyd 
and hence || AU — P,)x|l S |Ansil IU — Pax ll S Anil ll] +0 as n— oo. Since 
AP,x = pas, 1; (e;,x)e; part d) follows. 

Finally, Example 24.2.1 gives immediately that the spectrum of A is o(A) = 
{Aj iy € N} = {A,A2,... ,0} according to part b). 


Corollary 25.1 (Hilbert-Schmidt Theorem) The orthonormal system of eigen- 
functions e; of a compact self-adjoint operator A in a complex Hilbert space is 
complete, if and only if, A has a trivial null space: N(A) = {0}. 


25.2 Spectral Theory 361 


Proof Because of part d) of Theorem 25.5 the system of eigenfunctions is complete, 
if and only if, the closure of the range of A is the whole Hilbert space: ran A = H. 
Taking the orthogonal decomposition H = N(A) ® N(A)* and N(A) = N(A*) = 
(ran A)+ into account we conclude. 


25.2.2. The Fredholm Alternative 


Given acompact self-adjoint operator A on a complex Hilbert space H and an element 
g € H, consider the equation 


f —wAf =(1-pA)f =8. (25.1) 


Depending on the parameter pz € C one wants to find a solution f € H. Our starting 
point is the important 


Lemma 25.3 (Lemma of Riesz) /f A is a compact operator on the Hilbert space 
Hand 1 4 0 a complex number, then the range of I — pA is closed in H. 


Proof Since a scalar multiple of a compact operator is again compact we can and 
will assume yz = 1. As an abbreviation we introduce the operator B = J — A and 
have to show that its range is closed. Given an element f ¥ 0 in the closure of the 
range of B, there is a sequence (g,)n,en in H such that f = lim, Bg,. According 
to the decomposition = N(B)@ N(B)+ we can and will assume that g, € N(B)+ 
and g, £0 foralln EN. 

Suppose that the sequence (g,,),<en is bounded in H. Then there is a subsequence 
which converges weakly to some element g € H. We denote this subsequence in the 
same way as the original one. Compactness of A implies that the sequence (A gy )nen 
converges strongly to some h € H. Weak convergence of the sequence (gp )nen 
ensures that h = Ag ((u, Ag,) = (A*uU, g,)—> (A*u, g) = (u, Ag) for allu € H 
implies = Ag). Since Bg, > f asn— ow it follows that g, = Bg,+Agn— f+Ag. 
Thus, (g,)nen converges strongly to g and the identity g = f+ Ag holds. This proves 
f €ran(U — A). 

Now consider the case that the sequence (g,),cen iS not bounded. By taking a 
subsequence which we denote in the same way, we can assume lim, 9 || Zn || = 00. 
Form the auxiliary sequence of elements u, = ia 8n. This sequence is certainly 
bounded and thus contains a weakly convergent subsequence, which we denote again 
in the same way. Denote the weak limit of this sequence by u. Since A is compact we 
conclude that Au, > Au as n—oo. Now recall Bg,— f and ||g,,|| oo asn— oo. We 
deduce Bu, = iz ] Bg,—0 as n— 00, and therefore, the sequence u, = Bu, + Auy 
converges strongly. Since, the weak limit of the sequence is u it converges strongly 
to u. On the other side, Bu, + Au, converges to Au as we have shown. We deduce 
u = Au, i.e., u € N(B). By construction, g, € N(B)", hence u, € N(B)+, and this 
implies u € N(B)+ since N(B)* is closed. This shows u € N(B) N(B)+ = {0}, 
and we conclude that u,—0 as n—oo. This contradicts ||u,|| = 1 for alln € N. 


362 25 Compact Operators 


Thus, the case of an unbounded sequence (g,,),< does not occur. This completes the 
proof. 
Now we can formulate and prove 


Theorem 25.6 (Fredholm Alternative) Suppose A is a self-adjoint compact oper- 
ator on a complex Hilbert space H, g a given element in H and tz a complex number. 
Then either .~'! ¢ o(A) and the equation 


UI —pA)f=sg 
has the unique solution 
f=(U—pay'g 
or 4 € o(A) and the equation (I — wA)f = g has a solution, if and only if, 
g €ran(J — pA). In this case, given a special solution fo, the general solution is 


of the form f = fo +u, with u € NU — pA), and thus the set of all solutions is a 
finite dimensional affine subspace of H. 


-1 


Proof Lemma 25.3 gives 
ran (I — wA) = N(I — (wA)*)* = NU — PA). 


If uw! ¢ o(A), then 7 ¢ o(A) (o(A) C R) and thus ran(J — wA) = NU — 
A) = {0}+ = H and the unique solution is f = (I — wA)~'g. 

Now consider the case «~! € o (A). Then NU — A) # {0}. : is an eigenvalue 
of finite multiplicity (Theorem 25.5, part c)). In this case, ran (J — A) is a proper 
subspace of H and the equation (J — wA)f = g has a solution, if and only if, 
g €ran(/ — wA). Since the equation is linear it is clear that any solution f differs 
from a special solution fp by an element u in the null space of the operator (J — 1A), 
and we conclude. 


Remark 25.1 


1. The Fredholm alternative states that the eigenvalue problem for a compact self- 
adjoint operator in an infinite dimensional Hilbert space and that of self-adjoint 
operators in a finite dimensional Hilbert space have the same type of solutions. 
According to this theorem one has the following alternative: Either the equation 
Af = Af has a solution, i.e., A € o,(A), or (AL — A)! exists, ie., A € (A), in 
other words, o(A)\ {0} = o,(A) = og(A). Note that for self-adjoint operators, 
which are not compact, this alternative does not hold. An example is discussed 
in the Exercises. 

2. In applications one encounters the first case rather frequently. Given r > 0 
consider those 44 € C with |u| < r. Then there are only a finite number of 
complex numbers jz for which one cannot have existence and uniqueness of the 
solution. 

3. Every complex N x N matrix has at least one eigenvalue (fundamental theorem of 
algebra). The corresponding statement does not hold in the infinite dimensional 
case. There are compact operators which are not self-adjoint and which have no 
eigenvalues. The Exercises offer an example. 


Reference 363 


25.3. Exercises 


1. Prove: For noncompact self-adjoint operators the Fredholm alternative does not 
hold: In L2(R) the equation Af = f has no solution and (J — A)~! does not 
exist for the operator (Af)(x) = xf(x), for all f € D(A) where D(A) = {f € 
L?(R): xf € L*(R)}. 

2. On the Hilbert space H = £?(C) consider the operator A defined by 


X2 Xn 
A(X1,X2,%3, ae ) = (0, x1, Ay oe Bn at ae >). 
2 n 
Show that A is compact and not self-adjoint and has no eigenvalues. 
3. This problem is about the historical origin of the Fredholm alternative. It was 
developed in the study of integral equations. We consider the Fredholm integral 
equation of second kind: 


feSn i Raises: 


Show: For k € L?(R” x R") with k(x, y) = k(y,x) the operator A defined 
by (Af)(x) = f k(x, y) f(y) dy is compact and self-adjoint and the Fredholm 
alternative applies. 

As a concrete case of the above integral equation consider the case n = 1 and 
k = G, where G is the Green’s function of Sturm—Liouville problem: On the 
interval [a, b] find the solution of the following second-order linear differential 
equations with the given boundary conditions: 


y"(x) — q(x)y'(x) + wy(x) = f(x), 
hyy(a)+ky'(a)=0, hoy(b) + kny'(b) = 0, 


with h;,k; € IR, and where the h; and k; are not simultaneously equal to zero. 
Every solution y of the Sturm—Liouville problem is a solution of the Fredholm 
integral equation 


b 
yo) = wf Gon2y@dz= i, 
where g(x) = — ie G(x, z) f(z) dz and conversely. 
Hints: See Sect. 20 of [1] for further details. 
Reference 


1. Vladimirov VS. Equations of mathematical physics. Pure and applied mathematics. Vol. 3. New 
York: Dekker; 1971. 


Chapter 26 
Hilbert-Schmidt and Trace Class Operators 


26.1 Basic Theory 


Since they are closely related, we discuss Hilbert-Schmidt and trace class operators 
together. 


Definition 26.1 A bounded linear operator A on a separable Hilbert space H. is 
called a Hilbert-Schmidt operator respectively a trace class operator if, and only 
if, for some orthonormal basis fe, : 1 € N} the sum 


(oe) 


CO 
yo Aen? = (en, A* Aen), 
n=1 


n=1 


respectively, the sum 


or) 
» (én, |Alén) = 3 Ale, I 
n=1 


n=1 


is finite, where |A| is the modulus of A (Definition 21.5.1). 
The set of all Hilbert-Schmidt operators (trace class operators) on H. is denoted 


by Bx(H) (Bi (H)). 


Lemma 26.1 The two sums in Definition 26.1 are independent of the choice of the 


particular basis and thus one defines the trace norm ||-||; of a trace class operator 
A by 
[o,e) 
All: = ) 0 (en, [Alen) (26.1) 


n=1 


and the Hilbert-Schmidt norm ||-||, of a Hilbert-Schmidt operator A by 


1/2 
Ally = Atal,” = (> set (26.2) 


© Springer International Publishing Switzerland 2015 365 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_26 


366 26 Hilbert-Schmidt and Trace Class Operators 


Proof Parseval’s identity implies for any two orthonormal bases {e, :n € N} and 
{fn :n € N} of H and any bounded linear operator B 


CO CO [o.@) CO 
Y- [Benll? = S> d- (Bens fn)? = D- | B* fin | - 


n=1 n=1 m=1 m=1 


Take another orthonormal basis {h, : n € N}, the same calculation then shows that 
we can continue the above identity by 


0° ‘ 0° 
» Bh, | = 3 || Be, || > 


n=1 m=1 


since B** = B, and hence this sum is independent of the particular basis. If we 
apply this identity for B = |A|!/? we see that the defining sum for trace class is 
independent of the particular choice of the basis. 


Corollary 26.1 For every A € By(H) one has ||A||, = ||A* lo. 


Proof This is immediate from the proof of Lemma 26.1. 
Basic properties of the set of all Hilbert-Schmidt operators and of the Hilbert— 
Schmidt norm are collected in the following theorem. 


Theorem 26.1 


a) B2(H) is a vector space which is invariant under taking adjoints, i.e., A € Bo(H) 
if, and only if, A* € B(H); furthermore, for all A € Bo(H), 


|", = lAll- 
b) The Hilbert-Schmiadt norm ||-||2 dominates the operator norm ||-|\, i.e., 
|All < IIAll2 


forall A € Bo(H). 
c) For all A € B2(H) and all B € BCH) one has AB € B3(H) and BA € B2(H) 
with the estimates 


ABlly < All BI, IBAllz < BI All, 
i.e., By(H) is a two-sided ideal in B(H). 
d) The vector space B3(H) is a Hilbert space with the inner product 


(A, B) ns = (Aen, Ben) = Tr(A*B), A,B € Bo(H) (26.3) 
n=1 


and the Hilbert-Schmidt norm is defined by this inner product by ||Al|. = 


VA, A) is: 


26.1 Basic Theory 367 


Proof 


(a) It is obvious that scalar multiples AA of elements A € 6 (H) again belong to 
B2(H). If A, B € Bo(H) and if {e,} is an ONB then the estimate 


(A + Byenll? < 2 (lAenll? + || Benll”) 


immediately implies A + B € B)(H). Thus, B,(H) is a vector space. Corollary 
26.1 now implies that for A € B,(H) also A* € Bz(H) and ||A*||, = ||Allo- 

(b) Any given unit vector h € H can be considered as an element of an ONB {e,}, 
therefore we can estimate for A € B.(H) 


AAI? < 50 | Aenll? = Al, 


n 


and it follows 
| Al] = sup {|| AA]: eH, ||hl| = 1} < |Alle. 


For A € B)(H) and B € B(H) and every basis vector e, one has || BAe, ||? < 
|| B\|? || Ae, ||? and thus (26.2) implies ||BA||, < || Bll ||Al],. Next part (a) says 
| ABlly = ||(AB)* lz = ||B*A* lly < |B* I] A* lla = BI Alla. And it follows 
AB, BA € Bo(H). 

For an ONB {e,} and any A,B € B2(H) one has, using Schwarz’ inequality 
twice, 


(c 


wm 


(d 


wm 


Yo l(Aen, Ben)| < NAll2 Blo. 

We conclude that (26.3) is well defined on 6,(H) and then that it is antilinear 
in the first and linear in the second argument. Obviously (A, A)qs > 0 for all 
A € B)(H) and (A, A) ws = 0 if, and only if, Ae, = 0 for all elements e, of an 
ONB of H, hence A = 0. Therefore, (26.3) is an inner product on B2(H) and 
clearly this inner product defines the Hilbert—Schmidt norm. 

Finally, we show completeness of this inner product space. Suppose that {A,} 
is a Cauchy sequence in 6,(H). Then, given ¢ > 0, there is mo such that 
Am — Anll, < ¢ for all m,n > no. Since ||Al| < ||All2, this sequence is 
also a Cauchy sequence in B(H), and hence it converges to a unique A € B(H), 
by Theorem 21.3.3. For an ONB {e;}, n > no, and all N € N, we have 


N N 
Dea ~ Andes]? = tien M4 — Andes I? tien WA — An < 
i= I= 


and conclude a 
S- |(A - Ande;l|” < &?. 
j=l 


This shows A — A, € Bo(H), hence A = A, + (A — A,) € Bo(H) and 
||A — A,llo < ¢ for n > no and A is the limit of the sequence {A,} in the 
Hilbert—Schmidt norm. 


368 26 Hilbert-Schmidt and Trace Class Operators 


Though trace class operators share many properties with Hilbert-Schmidt operators, 
some of the proofs are more complicated. For instance, the fact that the modulus of a 
bounded linear operator is not subadditive does not allow such a simple proof of the 
fact that the set of all trace class operators is closed under addition of operators, as in 
the case of Hilbert—-Schmidt operators. For this and some other important properties, 
the following proposition will provide substantial simplifications in the proofs (see 
also [2]). 


Proposition 26.1 For a bounded linear operator A on H, the following statements 
are equivalent: 


(a) For some (and then for every) ONB {en} one has S\(A) = >, (én, |Alén) < 00. 
(b) Sy(A) = inf {|| Bl], ||Cl]. : B,C € Bo(H), A = BC} < ~w. 
(c) S3(A) = sup {¥>, |(en Afn)| + {én} s(fi} are ONS in 1} < oo 


For A € B,(H) one has S\(A) = S2(A) = $3(A) = ||All,- 


Proof Suppose S;(A) < oo. Write the polar decomposition A = U|A| as A = BC 
with B = U|A|!/? and C = |A|!/?. Then 


[BIZ = > |UIAlen|” < ¥> ALen |)” = Si(A) < 00 
and eal = $,(A) < o and thus $)(A) < S,|(A) < cw. 


Next suppose that S:(A) < oo and take any ONS {e,} and {f,} in H. Write 
A = BC with B,C € B2(H) and estimate 


1/2 1/2 
> Kens Afa)| = >) [{B*en, Chr)| S (x | Bren ’) (x ier 


< |B], Clo = Bll WC. 


It follows $3(A) < S2(A) < oo. 
Finally, assume that $3(A) < oo and take an ONB {e,} for Ran({A|). Then 
Sn = Ue, is an ONB for Ran(A) and thus we can estimate 


Si(A) = (en Alen) = ) > (en, U* Aen) = (fans Aen) < S3(A), 


n n n 


and therefore S\(A) < $3(A) < co. 
If A € B,(H), then by definition S;(A) < oo. The above chain of estimates shows 
S3(A) < S2(A) < S)(A) < S3(A) and thus we have equality. 


Theorem 26.2. 


(a) B\(H) is a vector space which is invariant under taking adjoints, i.e., A € B\(H) 
if, and only if, A* € By(H); furthermore, for all A € B,(H), 


[4°], = Alb. 


26.1 


Basic Theory 369 


(b) The trace norm ||-||, dominates the operator norm ||\-||, i.e., 


All < Alh 


forall A € B\(H). 


(c) Forall A € B\(H) and all B € B(H) one has AB € B\(H) and BA € B,(H) 


with the estimates 


ABI, < WAN BI, BAT, Ss IBIMATL, 


i.e., B\(H) is a two-sided ideal in B(H). 


(d) The vector space B\(H) is a Banach space under the trace norm |\-||,. 


Proof 
(a) For a scalar multiple AA of A € 6,(H) one obviously has S;(AA) = |A|S\(A) 


(b 


(d 


wm 


Near 


wm 


and thus AA € B,(H) and ||AA||, = |A|||A|l,;. A simple calculation shows 
S3(A*) = §$3(A) and therefore by Proposition 26.1, A* € 6,(H) whenever 
A € B\(H) and ||A* ||; = |All. 

For A, B € B,(H) we know by Proposition 26.1 that S3(A) and $3(B) are finite. 
From the definition of S3(-) we read off $3(A + B) < $3(A) + S3(B), thus 
S3(A + B) is finite, i.c., A+ B € B,(H). 

If for A € B,(H) one has ||A||, = 0, then |A|!/2e,, = 0 for all elements of an 
ONB {e,} of H, hence |A|!/? = 0 and thus A = 0. Therefore, the trace norm 
||- ||; is indeed a norm on the vector space B,(H). 

Given unit vectors e, f € H we can consider them as being an element of an 
ONS {e,}, respectively, of an ONS { f,,}; then, 


Ke, AF)] <)> Men, Afa)| < S3(A), 


it follows 
| All = sup {|(e, Af)| se, f € H, llell = Il fll = 0} < S3(A) = |All, - 


If A € B,(H) has a decomposition A = CD with C, D € Bo(H), then for any 
B &€ BCH), BA has a decomposition BA = BCD with BC,D € B,(H), by 
Theorem 26.1, part (c). We conclude 


S2(BA) < ||BC|l2 |Dll2 < WBUMNCll2 IP lle 


and thus S:(BA) < ||B|| S:(A) < on. It follows ||BA||; < ||B|l ||All,. Since 
we have established that 5; (#) is invariant under taking adjoints, we can prove 
AB € B,(H) as in the case of Hilbert-Schmidt operators. 

Finally, we show completeness of the normed space 6,(H). Suppose that {A,,} 
is a Cauchy sequence in 6\(H). Then, given ¢ > 0, there is mo such that 
Am — Anlly < ¢ for all m,n > no. Since ||A]| < ||Al], this sequence is also 


a Cauchy sequence in B(H) and hence it converges to a unique A € B(H), by 


370 26 Hilbert-Schmidt and Trace Class Operators 


Theorem 21.3.3. Fix n > no; for any ONS {e;} and { f;} in H and any N € N 
we have 
N 


rk ej, (A—An) fi) ime 2 [(ej(Am—An)fi)| Slim $3(Am—An) < € 


J=1 j=l 


and conclude 


Y-Mej,(A = An) fi) < €. 


This shows A — A, € B,(H), hence A = A, + (A — A,) € B,(H) and 
A — Anll, < . 


Corollary 26.2 The space of trace class operators is continuously embedded into 
the space of Hilbert-Schmidt operators: 


BiH) Bo(H). 


Proof According to the definitions one has ||A II3 = ||A*Al||,. When we apply parts 
(c), (b), and (a) in this order we can estimate 


|A*A]], < JA* Al: < [4*], WAl = Aly 


and thus ||A||2 < ||A||, which implies our claim. 


Corollary 26.3. On the space of all trace class operators the trace is well defined 
by ( {en} is any ONB of H) 


Tr(A) = ) (en, Aen), A € BiH). (26.4) 


n 


This function Tr : B,}(H) — K is linear and satisfies for all A € B\(H) 


a) |Tr(A)| < |Alh 

b) Tr(A*) = Tr(A) 

c) Tr(AB) = Tr(BA), for all B € B\(H) 

d) Tr(UAU*) = Tr(A) for all unitary operators U on H 


Proof We know that A € B,(H) can be written as A = BC with B,C € Bo(H). 
For any ONB {e,}, we estimate 


Y-Mens Aen) < - | B*en|| }Cenll < | B*l], ICllo = WBll2 ICll < 00 
n n 


and conclude 

Yo Mens Aen)| < S2(A) = Alli. 
As earlier one proves that the sum in (26.4) does not depend on the choice of the 
particular basis. Linearity in A € 6)(H) is obvious. Thus (a) holds. The proof of (b) 
is an elementary calculation. 


26.1 Basic Theory 371 


If A, B € B,(H) choose another ONB { f,, } and use the completeness relation to 
calculate 


Di len ABer) = 2 A*en, Ben) = 2d MG Fn) tbe) = 


= eG eu Ag) SY (BAG) = Ya Bata) 


m n m m 


hence (c) holds, since we know that the above series converge absolutely so that the 
order of summation can be exchanged. Part (d) is just a reformulation of the fact that 
the trace is independent of the basis which is used to calculate it. 


Theorem 26.3 (Spectral Representation of Hilbert-Schmidt and Trace Class Op- 
erators ). Let H. be a separable Hilbert space and denote by B.(H) the space of all 
compact operators on H (see Theorem 22.3.1). Then 


(a) BH) C BH), ie, Hilbert-Schmidt and thus trace class operators are 
compact. 

(b) A bounded operator A on H is a Hilbert-Schmidt operator, respectively, a trace 
class operator if, and only if, there are two orthonormal bases {é,} and {x,} of 
H. and there is a sequence {i,} in K with 


ye lXal? <0, respectively, 2 |An| < 00 


n 


such that 


Ax = Sanlu am forall x EH (26.5) 


n 


and then one has 


1/2 
i4ts= (Tha?) , respectively, |All, =). |Anl- 
n n 


Proof 
(a) Suppose that {e,} is an ONB of H; denote by Py the orthogonal projector onto 
the closed subspace [e,..., ey] spanned by e),...,ev. Then for A € B,(H) one 
has 
o.e) 
A —APyll3 = > (A —APwenll? = Yo | Aenll?. 
n n=N+1 


hence ||A — APy|| < ||A — APy|l2 —> 0 as N — ov. Therefore, A is the norm 
limit of the sequence of finite rank operators AP y and thus compact by Theorem 
22.3.2. 

By Corollary 26.2, we know 5,(H) C B.(H), hence trace class operators are 
compact. 


372 26 Hilbert-Schmidt and Trace Class Operators 


(b) Suppose that A € B)(H), j = 1 or j = 2, is given. By part (a) we know that 
A and its modulus |A| are compact. The polar decomposition (Theorem 21.5.2) 
relates A and |A| by A = U|A| where U is a partial isometry from ran |A| to 
ran A, 

According to the Riesz—Schauder Theorem (25.5) the compact operator | A| has 
the following spectral representation: 


|Alx = Do Aj(ej,x)e; forall xeH (26.6) 
J 


with the specifications: 


(i) Ay > Ad > +++ & An-1 = An = O are the eigen-values of |A| enumerated in 
decreasing order and repeated in this list according to their multiplicity, 
(ii) e; is the normalized eigen-vector for the eigen-value A ; 
(iii) the multiplicity of every eigen-value 4; > 0 is finite, and if there are infinitely 
many eigen-values then A; —> O as j — oo. 
If the ONS {e it is not eorplets, we can extend it to an ONB {e, alt of H and 
calculate, using (26.6), 


All, = > (e),,1Ale,) = SoA; < 00 for A € Bi(H), 
j 


n 


respectively 


A=) [lAle, |’ ae es for A € B(H). 


If we apply the partial isometry U to the representation (26.6), we get (26.5) 
with x, = Ue. 

Conversely suppose that an operator A has the representation (26.5). If 
>|, [An|? < 00 holds one has 


» 4a) = enh > PS ener a> alt 
j iy n n j " 


thus ||A||5 = >-, |An|* < co and therefore A € By(H). 
Suppose that )>,, |An| < oo holds. In the Exercises, we show that (26.5) implies 


|Alx =) > lAnllensx)en, x EH. 


n 


It follows ||Al|,; = Tr(Al) = ><, |An| < 00, hence A € 6,(H) and we 
conclude. 


Remark 26.1 One can define Hilbert-Schmidt and trace class operator also for the 
case of operators between two different Hilbert spaces as follows: For two separable 


26.2 Dual Spaces of the Spaces of Compact and of Trace Class Operators 373 


Hilbert spaces H, and H,. a bounded linear operator A : H, —> Hz is called a 
Hilbert—Schmidt operator if, and only if, there is an orthonormal basis {e,} of H 
such that 


lo.e) 

2 
Y > || AenlI3 < 00, 
n=1 


where ||- ||, is the norm of H2. 

A bounded linear operator A : 7H; —> Hz is called a trace class operator or a 
nuclear operator if, and only if, its modulus |A| = J A*A is a trace class operator 
on H. 

With this slightly more general definitions the results presented above still hold 
with obvious modifications. We mention the spectral representation of Theorem 26.3. 

A bounded linear operator A : H,; —> Hz is a Hilbert-Schmidt, respectively, a 
trace class operator if, and only if, it is of the form 


CO 
Ax = Jo An(en.x)ien, forall x € Hy, (26.7) 


n? n? 
n=1 
where {ei} is an orthonormal system in H;, i = 1,2 and where the sequence of 
numbers A, 4 0 satisfies >, |An|? < coand ae |An| < 00, respectively. 


26.2 Dual Spaces of the Spaces of Compact and of Trace Class 
Operators 


The space of linear operators on H which have a finite rank is denoted by B ¢(H). The 
following corollary highlights important results which in essence have been proven 
already in the last few theorems. 


Corollary 26.4 For any separable Hilbert space H, one has 
B,(H) C By(H) C Bo(H) C BCH) C BCH) 


and all the embeddings are continuous and dense. B ¢(H) is dense in B;(H), j = 1,2 
and in B.(H). 


Proof In the proof of the last two results, it was shown in particular that 
Bs(H) C BH) C BAH), jf =1,2 


holds and that the finite rank operators are dense in B.(H) and in B;(H) for j = 1, 2. 
Parts (b) of Theorem 26.1, respectively, Theorem 26.2. imply that the embeddings 
BiH) > BH), j = 1,2, are continuous when B,(H) is equipped with the operator 
norm. By Corollary 26.2 we conclude. 


374 26 Hilbert-Schmidt and Trace Class Operators 


According to Theorem 26.1, the space of Hilbert-Schmidt operators B,(H) is a 
Hilbert space. Hence, according to the definition of the inner product (26.3) the 
continuous linear functionals f on this space are given by 


f(A) =Tr(BA) forall A¢B,(H), forsome B= By € BH). (26.8) 


Part (c) of Theorem 26.2. says that the space of the trace class operators is a two-sided 
ideal in B(H), hence Tr(BA) is well defined for all B € B(H) and all A € B,(H) 
and Part (a) of Corollary 26.3. allows to estimate this trace by 


|Tr(BA)| < || Bll |All. (26.9) 


Therefore, for fixed B € B(H), fg(A) = Tr(BA) is a continuous linear functional on 
B,(H), and for fixed A € B,(H), g4(B) = Tr(BA) is acontinuous linear functional on 
B(H). Here we are interested in the space 6,(H)’ of all continuous linear functionals 
on 6,(H) and in the space 6,(H) of all continuous linear functionals on B.(H) C 
B(H). Note that according to Corollary 26.4 the restriction of a continuous linear 
functional on 6,.(H) to B2(H) is a continuous linear functional on B2(H) and thus 
given by the trace, i.e., formula (26.8). 


Theorem 26.4 For a separable Hilbert H the space of all continuous linear func- 
tionals B.(H)' on the space B.(H) of all compact operators on ‘H and the space 
B,(H) of all trace class operators are (isometrically) isomorphic, i.e., 


B.(H) = B(H). 
The isomorphism B\(H) —> B.(HY is given by B +> 8 with 
op(A) = Tr(BA) for all A € BCH). (26.10) 


Proof As mentioned above, given F € B.(H)', we know F|g,(71) € Bo(H) and 
thus there is a unique B € 6,(H) such that 


F(A)=Tr(BA) forall A € B(H). 


In order to show that actually B € B,(H) we use the characterization of trace class 
operators as given in Proposition 26.1 and estimate $3(B). To this end take any two 
ONS {e,} and { f,} in H and observe that there is a, € R such that 


el fie Ben) = fn Bey)|. 


Introduce the finite rank operators [e,, f,] defined by [en, frlx = (fn, x)en and then 
the finite rank operators 


m 
An= > 6" [ens Fal- 
n=1 


26.2 Dual Spaces of the Spaces of Compact and of Trace Class Operators 375 


Since |[Anx|l? = 7" (fae x)? < lx)’, one has || Am] < 1. Thus we write, for 
anym EN, 


wir (fn Be,)| =e | (his Bey) = Tr(BA»,) = F(Am) 


n=1 n=1 
since ( f,, Be,) = Tr(Blen, f,]). We conclude 


m 


Yo M(fas Ben)| < FI 


n=1 


thus S3(B) < ||F'||/ and hence B € B,(H). Introduce the continuous linear functional 
gs : BCH) —> K by 


oe(A) = Tr(BA) forall A € BCH). 


We conclude that every F € B.(H)' is of the form F = #g withaunique B € B,(H). 
Now by (26.9), it follows 


|| F'll’ = sup {|F(A)| : A € BH), |All < 1} < WBlh- 


In order to show || F'||’ = ||¢all/ = || B||, recall that || B||,; = Tr(|B|) when B has the 
polar decomposition B = U|B| with a partial isometry U. For an ONB {e,} of H 
form the finite rank operator Ay, = )~)"_, [€n, @n ]U* and calculate 


TBI) =THAyB)=Te (J feel") S > Tr(len, enl| BI) = teu Ble 


n=1 n=1 


It follows 


m 


lal’ = [Tr(BAn)| = Den, |Blen) 


n=1 


for all m € N and thus ||¢3||' = || B||,, and we conclude. Basic properties of the trace 
show that the map B —~ @z is linear. Hence, this map is an isometric isomorphism 
form B,(H) to B.C)’. 

In a similar way one can determine the dual space of the space of all trace class 
operators. 


Theorem 26.5 For a separable Hilbert H the space of all continuous linear func- 
tionals B\(HY on the space B,(H) of all trace class operators on H. and the space 
B(H) of all bounded linear operators are (isometrically) isomorphic, i.e., 


B (Hy = BH). 
The isomorphism B(H.) —> B\(H)' is given by B t> Wz where 


W,(A) = Tr(BA) forall A € B\(H). (26.11) 


376 26 Hilbert-Schmidt and Trace Class Operators 


Proof In the first step, we show that given f € B,(H)’, there is a unique B = By € 
B(H) such that f = Wg where again yz is defined by the trace, i.e., W_(A) = Tr(BA) 
for all A € 6,(H). For all x, y € H define 


by(x,y) = f(ly. x) 


where the operator [y, x] is defined as above. Since [y, x] € By(H) C Bi (H), by is 
well defined on H x H. Linearity of f implies immediately that b+ is a sesquilinear 
form on H. This form is continuous: For all x, y € 1 the estimate 


lbp. y= IF Cy. xD) < AW Dyas AW ed My 


holds, since by Proposition 26.1 one has, using Schwarz’ and Bessel’s inequality, 
ILy. x], = S3(Ly.x]) < lll llyll. Therefore by Theorem 20.2.1, there is a unique 
bounded linear operator B such that b r(x, y) = (x, By), ie., 


f(Ly.x]) = (x, By) = Tr(Bly,x]) forall x,y eH. 


The last identity follows from the completeness relation for an ONB {e,} of H: 
Tr(Bly,x]) = denen Bly, x]eén) = dn (en: By(x,€n)) = eC By)(x,€n) = 
(x, By). By linearity this representation of f is extended to Br(H) C B,(H). And 
since both f and Tr are continuous with respect to the trace norm this representation 
has a unique extension to all of B;(H) (By(H) is dense in B,(H)): 


f(A) = Tr(BA) = (A) forall A € B,(H). 


Linearity of Tr implies easily that B —> Wz is a linear map from B(H) to B,(H)’. 
Finally, we show that this map is isometric. 
The continuity estimate (26.9) for the trace gives for B € B(H) 
||Well’ = sup {[We(A)] : A € BiH), All, < US |B]. 


We can assume B # 0. Then ||B|| > 0 and there is x € H, ||x|| < 1 such that 


||Bx|| = || Bl| — ¢ for any e € (0, || B|| ). Introduce € = Ta and calculate as above 


Wp ([x,§]) = Tr(B[x, €]) = (€, Bx) = ||Bxl| = Bll —e, 
hence ||Wg||' => || B|| — e. This holds for any 0 < € < || B||. We conclude 
Mell’ = 1B 


and thus ||W% ||’ = || B|| and B —> Wg is an isometric map from B(H) onto By (H)’. 


Remark 26.2 According to this result one has the following useful expressions for 
the trace norm and the operator norm: The trace norm of T € 6,(H) is given by 


|7 ||; = sup |Tr(BT)|, (26.12) 


26.3 Related Locally Convex Topologies on B(H) 377 


where the sup is taken over all B € B(H) with || B|| = 1 and similarly the norm of 
Be BH) is 


|| B|| = sup |Tr(BT)|, (26.13) 


where the sup is taken over all T € B,(H) with ||T ||, = 1. Since B,(H) is generated 
by the cone of its positive elements one also has 


|| B|| = sup |Tr(BW)|, (26.14) 


where the sup is taken over all density matrices W, i.e., W € B\(H), W > 0, and 
Wil, = TrW) = 1. 


Remark 26.3 Itis instructive to compare the chain of continuous dense embeddings 
B,(H) > BiH) > BH) > BCH) > BH) (26.15) 


for the spaces of bounded linear operators on a separable Hilbert space H over the 
field IK with the chain of embeddings for the corresponding sequence spaces 


£(K) @ £(K) 6 (RK) > o(K) © €*(R), (26.16) 


where £ (IK) denotes the space of terminating sequences and co(KK) the space of null 
sequences. And our results on the spectral representations of operators in 6\(H), 
B.(H), and B.(H) indicate how these spaces are related to the sequence spaces 
£'(R), €7(IK), and co(K). 

For these sequence spaces it is well known that 2° (IK) is the topological dual of 
el (KR), €'(KY = €© (IR), and that 2'(IK) is the dual of co(IK), co(KK)’ = 2'(iK), as the 
counterpart of the last two results: B|(H)’ = BCH) and B.(H)' = By (H). 


26.3 Related Locally Convex Topologies on B(H) 


Recall that in Sect. 21.4, we introduced the weak and the strong operator topologies on 
B(H) as the topology of pointwise weak, respectively, pointwise norm convergence. 
In the study of operator algebras some further topologies play an important role. Here 
we restrict our discussion to the operator algebra B(H), respectively, subalgebras of 
it. Recall also that in the second chapter we had learned how to define locally convex 
topologies on vector spaces in terms of suitable systems of seminorms. This approach 
we use here again. We begin by recalling the defining seminorms for the strong and 
the weak topology. 

The strong topology on B(H) is defined by the system of seminorms p,, x € H, 
with 

px(A) = ||Ax|l, A € BCH). 


Sometimes it is important to have a topology on B(H) with respect to which the 
involution * on B(H) is continuous. This is the case for the strong* topology defined 


378 26 Hilbert-Schmidt and Trace Class Operators 


by the system of seminorms p*, x € H, with 


pe(A) = V Axl? + I|A*xll?, A € BCH). 


The weak topology on B(H) is defined by the system of seminorms p,.y, x,y € H, 
with 
Pxy(A) = |(x, Ay)|, A € BCH). 


Similarly, one defines the o-weak and o-strong topologies on B(H). Often these 
topologies are also called ultraweak and ultrastrong topologies, respectively. 

The o-strong topology on B(H) is defined in terms of a system of seminorms 
q = Aer)» {en} CH, Do, llenll? < 00, with 


1/2 
q(A) = (x ise , Ae BH). 


And the o-strong* topology is defined by the system of seminorm q*, g as above, 
with 

q*(A) = (qAy + q(A*y'?, A € BH). 
Next suppose that {e,} and {g,} are two sequences in H satisfying >~,, |len \|? < 00 


and )*, |lgnll?_ < oo. Then a continuous linear functional T = Tie,),{¢,) is well 
defined on B(H) by (see Exercises) 


T(A) = ) (gn, Aen), A € BCH). (26.17) 


n 


Now the o-weak topology on B(H) is defined by the system of seminorms pr, T 
as above, by 


pr(A) = |T(A)| =|) (8n, Aen) 


n 


Using the finite rank operators [e,, g,] introduced earlier we can form the operator 
r= ae [én, 2n]. For any two orthonormal systems {x i} and { y i} in H, we estimate 
by Schwarz’ and Bessel’s inequalities 


So 1x). Lens Salvi) = D> Mxj,€n) (Sn ¥s)I S llenll [gall 
J j 


and thus 


Yi lap Ty l < D> llenll tlgnll < 00. 


j n 


Proposition 26.1 implies that T is a trace class operator on H. In the Exercises, we 
show that for all A € B(H) 


Tr(AT) = } (Bn, Aen) = T(A), (26.18) 


n 


26.3 Related Locally Convex Topologies on B(H) 379 


hence the functional T of (26.17) is represented as the trace of the trace class operator 
T multiplied by the argument of T. 

According to Theorem 26.4, the Banach space dual of the space of compact opera- 
tors B.(H) is isometrically isomorphic to the space 5; (H) of trace class operators on 
H. and according to Theorem 26.5, the Banach space dual of 6)(H) is isometrically 
isomorphic to the space B(H) of all bounded linear operators on H.. 

Thus, we can state: 

The o-weak or ultraweak topology on B(H) is the weak*-topology from the identification 


of B(H) with the dual of 6)(H), ie., the topology generated by the family of seminorms 
{pz :Te B(H)} defined by pa(A) = |Tr(T A)| = |T(A)| for A € BCH). 


It is easy to verify that the weak topology on B(H) is the dual topology o(B(H), 
By(H)). 

Recall also that nonnegative trace class operators are of the form T= eS. [€n, en] 
with >>, lle, ||? < 00. For A € B(H) we find 


Tr(A*AT) =) ||Aen|l? = T(A*A), 


where T is the functional on B(H) which corresponds to ac according to (26.18). 
Hence, the defining seminorms gq for the o-strong topology are actually of the form 


q(A) = (T(A*A))'? = (Tr(A*AT))"?. 

From these definitions it is quite obvious how to compare these topologies on 
B(H): The o-strong* topology is finer than the o-strong topology which in turn is 
finer than the o-weak topology. And certainly the o-strong* topology is finer than 
the strong* topology and the o-strong topology is finer than the strong topology 
which is finer than the weak topology. Finally, the o-weak topology is finer than the 
weak topology. Obviously the uniform or norm topology is finer than the o-strong* 


topology. 


Main topologies on 4(.7) 


uniform 


o-strong 


o-weak strong 


direction of finer topology 
“Sees eee eee ee ee eee eee 
direction of implied convergence 


weak 


380 26 Hilbert-Schmidt and Trace Class Operators 


Nevertheless one has the following convenient result: 


Lemma 26.2 On the closed unit ball B = {A € B(H): ||A|| < 1}, the following 
topologies coincide: 


a) The weak and the o-weak 
b) The strong and the o-strong 
c) The strong* and the o-strong* 


Proof Since the proofs of these three statements are very similar, we offer explicitly 
only the proof of (b). 

Clearly it suffices to show that for every neighborhood U of the origin for 
the o-strong topology there is a neighborhood V of the origin for the strong 
toplogy such that VM B C U1 B. Such a neighborhood U is of the form 
U = {Ae BH): q(A) <r} with r > 0 and g(A? = y || Ae, ||? for some 
sequence e, € H with >>, lle, ||? < oo. Thus there is m € N such that 
Sacer len? < r?/2. Define a neighborhood V of the origin for the strong 
topology by V = {A € B(H) : p(A) < r/<V2} with the norm p given by 
p(Ay = 2", ||Aen ||. Now for A € VN B, we estimate 


m CO m CO 
qQAy=) > lAenIP+ D> WAenll? < >) lAel?+ dS) lenll? <1?/24r?/2=r? 


n=1 n=m+1 n=1 n=m+1 


and conclude Ac UN B. 


In addition continuity of linear functionals are the same within two groups of these 
topologies as the following theorem shows. 


Theorem 26.6 Suppose that K Cc B(H) is a linear subspace which is o-weakly 
closed. Then for every bounded linear functional T on K the following groups of 
equivalence statements hold. 


1) The following statements about T are equivalent: 

a) T is of the form T(-) = via Yj, -x;) for some points x;,yj © H 

b) T is weakly continuous 

c) T is strongly continuous 

d) T is strongly* continuous 

The following statements about T are equivalent (B is the closed unit ball in 

B(H)): 

a) T is of the form T(-) = Dye i %s) for some sequences x;,y; € H with 
PD lx; ||” < ooand i; lly? < c0 

b) T is o-weakly continuous 

c) T is o-strongly continuous 

d) T is o-strongly* continuous 

e) T is weakly continuous on KN B 

J) T is strongly continuous on KO B 

g) T is strongly* continuous on KN B 


2 


YS 


26.3 Related Locally Convex Topologies on B(H) 381 


Proof Since the first group of equivalence statements is just a ‘finite’ variant of the 
second we do not prove it explicitly. 

For the proof of (2) we proceed in the ordera. > b> cc. > d. => a. If a. is 
assumed then 


At |T(A)| =|) (;. 4x))| 
ri 


is obviously a defining seminorm for the o-weak topology. If we apply the Cauchy— 
Schwarz inequality twice this seminorm is estimated by 


1/2 1/2 
Ye Iyil? y> Axil)’ 
j j 


which is a continuous seminorm for the o-strong topology and thus T is also o- 
strongly continuous. Another elementary estimate now proves d. 

The only nontrivial part of the proof is the implication d. . > a. If T is o-strongly* 
continuous there is a sequences {x} with }° Ix; ||? < 00 such that for all A € K 


IT(AP < D> (Axil? + [Ai P)- (26.19) 
j=l 


Form the direct sum Hilbert space 


H= DH; On) =CHOH), 


j=l 
where Hi; is the dual of #; = H for all j € N. For A € B(H) define an operator A 
on H by setting for } € H with components yj ® yi eH; ® Hi 

(Aj); = Ay, B(A* yj, J EN. 
A straightforward estimate shows 

a iP? «x 

WA5ig < (IAP + A"P)  Ustha- 
On the subspace 
R= {Az:Aex| 


of H where X is defined by the sequence {x i} of the estimate (26.19), define the map 
T by setting 8 

T(Ax) = T(A). 
By (26.19) it follows that T isa well-defined bounded linear map K—-K (recall that 
Hj > Hi is antilinear). Theorems 15.3.2 (extension theorem) and 15.3.1 (Riesz— 


Fréchet) imply that there is an element y in the closure of the subspace K in H such 


382 26 Hilbert-Schmidt and Trace Class Operators 


that — : 
T(Ax) = (y, AX) y 
for all A € K, thus by expanding the inner product of H 
T(A) = T(AX) = > (yj, Ax)n + (94, (A*X)))90) 
Jj 
= (yj, Axia + (xj, Ays)W) 
j 


and therefore T is of the form given in statement a.. 

The remaining part of the proof follows with the help of Lemma 26.2 and 
Corollary 2.1 which says that a linear functional is continuous if, and only if, it is 
continuous at the origin (see also the Exercises). 


Remark 26.4 A considerably more comprehensive list of conditions under which 
these various locally convex topologies on B(H) agree is available in Chap. II of [7]. 


26.4 Partial Trace and Schmidt Decomposition in Separable 
Hilbert Spaces 


26.4.1 Partial Trace 


The first guess for defining the partial trace in the case of infinite dimensional Hilbert 
spaces H; would be, in analogy to the the case of finite dimensional Hilbert spaces, 
to start with the matrix representation of A € B,;(H; ® Hz) with respect to an or- 
thonormal basis {e 1 @ fk } of H.; ® Hz and to calculate the usual sums with respect to 
one of the ONBs {e ij } { fx}, respectively. However, infinite sums might be divergent, 
and we have not found any useful way to express the fact that A is of trace class in 
terms of properties of the matrix entries 


A jikisioks Jir Jaki, ko € N. 


But such a procedure can be imitated by introducing a suitable quadratic form 
and investigate its properties (see [8]). 


Theorem 26.7. (Existence, Definition and Basic Properties of Partial Trace) Let 
H, and Hz be two separable complex Hilbert spaces. Then there is a linear map 


T : BiH, ® H2) — BH) 


from the space of trace class operators on H, ® Hz into the space of trace class 
operators on Hy, which is continuous with respect to the trace norm. It has the 
following properties 


T(A, @ Az) = Ai Try,(A2) forall Aj € By(Hj), i= 1,2; (26.20) 


26.4 Partial Trace and Schmidt Decomposition in Separable Hilbert Spaces 383 
Try, (T(A)) = Tr,9%)(A) forall A € B\(H, ® Hz); (26.21) 


T((A, ® b)A) = A\T(A) forall A, € B(H)), and all A € B\(H; ® Ha), 
(26.22) 


where I denotes the identity operator on Hz and B(H) the space of bounded linear 
operators on Hy. 

On the basis of Property (26.20) the map T is usually denoted by Trz,, and 
called the partial trace with respect to H/. Later in Proposition 26.2 an enhanced 
characterization of the partial trace will be offered. Actually this map T is surjective 
(in Formula (26.20) take any fixed Az € 6)(H2) with Tr(Az) = 1.) In applications 
to quantum physics the partial trace allows to calculate the ‘marginals’ of states of 
composite systems and is therefore also called conditional expectation. 


Proof Let {f;; 7 € N} be an orthonormal basis of H2. For a given operator A € 
B\(H, ® Hz) define a sesquilinear form Q, on H, by setting for u,v € Hy 


[oe] 


Oalu,v) = Yu ® fj, AV ® fie. (26.23) 


j=l 


By inserting the spectral representation (26.5) for H = H, ® Hz we can write this 
as 


Oatu, v) = 2d dda (u® fj, %n)i92(en,v ® fj)1@2- 
j=l n=1 


For u,v € H, with ||u|| = ||v|| = 1 we know that {u® fj} and {v@ fj} are ONS in 
H ®@ Hz and thus we can estimate, using first Schwarz’ inequality and then Bessel’s 
inequality for these ONS, 


|Qatu,v)| < 2 [An| LK (u ® fj.%n)192(€nsv ® F;)1e2| 


< 2 [Ant len nll = 2 nl = All. 


This implies for general u,v € H, 


|Qa(u,v)] < |JAlls Walls vi. 


and thus the sesquilinear form Q, is well defined and continuous. Therefore, the 
representation formula for continuous sesqulinear forms applies and assures the 
existence of a unique bounded linear operator T(A) on 1 such that 


Oa(u,v) = (u, T(A)v), forall u,v € Hy (26.24) 


and ||7(A)|| < ||Alh- 
In order to show T(A) € B,(Hj,), we use the characterization of trace class 
operators as given in Proposition 26.1 and estimate S3(7(A)) by inserting the spectral 


384 26 Hilbert-Schmidt and Trace Class Operators 


representation (26.5) for A € B,(H, ® H2). To this end take any orthonormal 
sequences {u,} and {v,,} in H;. Since then {un ef 7} and {Vm ef A are orthonormal 
sequences in H,; ® H2 we can estimate as follows, again applying first Schwarz’ and 
then Bessel’s inequality: 


Kk Un, T(A)ve)i| = yy (uk @ fi,Xn)1@2(€ns Ve @ Fi) ie2l 


k nj 


< » [An| y | (ux ® FjsXn)1@2(Ens Ve @ Fj)i1e2! 
kj 


n 


< 5 lanl llenllig2 lnllie2 = ) 5 lanl = [All < 00 
k k 


We conclude T(A) € B,(H,) and 
IT (A)Iy < WAT. (26.25) 


The above definition of T(A) is based on the choice of an orthonormal basis 
{ FsiEN i‘ However, as in the case of a trace, the value of T(A) does actually 
not depend on the basis which is used to calculate it. Suppose that {h pieN } is 
another orthonormal basis of H. Express the f; in terms of the new basis, i.e., 


f= ouphy, uy €C. 
v 


Since the transition from one orthonormal basis to another is given by a unitary 
operator, one knows Yi UjvUjy = 5). Now calculate for u,v € Hy 


Yiu ® fj, AV ® feo = D> Wjtjy.(u @ hy, AV ® hy)) 182 


j J5Voe 


= DD Fivttjn(u @ hy, AW ® hy))ie2 = Yu @ hy, AV ® hy))1@2- 


veo j v 


Therefore, the sesquilinear form Q 4 does not depend on the orthonormal basis which 
is used to calculate it. We conclude that the definition of T(A) does not depend on 
the basis. 

Equations (26.23) and (26.24) imply immediately that our map T is linear and 
thus by (26.25) continuity with respect to the trace norms follows. 

Next, we verify the basic properties (26.20), (26.21), and (26.22). For A; € 
B,(H;), i = 1,2 one finds by applying the definitions for all u,v € H, 


(u, T(A, ® Az)v)1 = D> (U® fi, (Ai ® AV ® fj) 182 
j 


= Dou Aw)i (fj, Artie = (u, Av) Tra,(Ag), 
j 


26.4 Partial Trace and Schmidt Decomposition in Separable Hilbert Spaces 385 


hence T(A; ® Az) = A, Try,(A2), i-e., (26.20) holds. 
For A € B,(H; ® H2) we calculate, using an orthonormal basis {e;; i € N} 
of H, 


Tr1,(T(A)) = 9 (ei, T(Adei)1 = D> (ei ® fi, Ae: ® Fj)102 = Tren2(A) 
i i J 
and find that (26.21) holds. 


Finally, take any bounded linear operator A; on H,, any A € B,(H; ® H2), and 
any vectors u,v € H,. Our definition gives 


(u, T((A1 @ H)A)i = You @ fj, (A1 @ h)AW ® fj))ie2 


j 
= (Aju @ fj, AV ® fi))1e@2 = (Aju, T(Ay)1 = (u, AT (Av), 
j 
and thus T((A; ® 1,)A) = A,T(A), ie., (26.22) is established. 


Corollary 26.5. For all bounded linear operators A, on H, and all A € B\(H, ® 
H2) one has 


Tr3z1,@H2(A1 @ 12)A) = Try, (Ai Try, (A)). (26.26) 
Proof Apply first (26.21) and then (26.22) and observe that the product of a trace 


class operator with a bounded linear operator is again a trace class operator (see 
Theorem 26.2.). 


Proposition 26.2 (Partial Trace Characterization) Suppose that H, and Hy» are 
two separable Hilbert spaces and that a linear map L : B\(H, ® H2) — BiH) 
satisfies 

Try, (PL(A)) = Try,@n,(P ® 1A) (26.27) 


for all finite rank orthogonal projectors P on H, and all A € By(H; ® Hz). Then L 
is the partial trace with respect to H2: 


L(A) = Try, (A). (26.28) 


Proof By taking linear combinations of (26.27) of finite rank projectors P; we 
conclude that 
Try, (BL(A)) = Trxi@72.((B ® 12)A) 
holds for all B € By(H;) and all A € Bi (H; ® Hz). Hence, by Eq 26.26, we find 
Try, (BL(A)) = Trz, (BTr7,,(A)) 


or, observing Corollary 26.4 and taking the definition of the inner product on B2(H2) 
in (26.3) into account, 

(B, L(A))2 = (B, Tr#,(A))2 
all B € By(H)) and all A € B,(H; @ H)). Since By(H)) is dense in By(H,) we 
conclude. 


386 26 Hilbert-Schmidt and Trace Class Operators 


26.4.2 Schmidt Decomposition 


The elements of = H ®@ Hz can be described explicitly in terms of orthonormal 
basis of H;,i = 1,2: Supposee;, j € Nisan orthonormal basis of H, and f;, j ¢ N 
is an orthonormal basis of Hz. Then every element x € H is of the form (see for 
instance [3]) 


CO CO 
x= > queef, ayeC, YY lajl? = a7. (26.29) 
ij=l ij=l 


However, in the discussion of entanglement in quantum physics and quantum infor- 
mation theory (see [1]), it has become the standard to use the Schmidt representation 
of vectors in H which reduces the double sum in (26.29) to a simple biorthogonal 
sum. 


Theorem 26.8 (Schmidt Decomposition) For every x € H = H, ® Hz there are 
nonnegative numbers p, and orthonormal bases g,;n € N, of Hi and h,, n €N, 
of H2 such that 


oo oe) 
— Do Pabr @ hn, > o = [als (26.30) 


n=1 n=1 


Proof Weuse the standard isomorphism J between the Hilbert tensor product H; © 
Hy and the space Ly 5(H1; H2) of Hilbert-Schmidt operators 7H; —> Hp. In Dirac 
notation J is given by I(x) = Sa ci,j| Fj) (eil, Le., for all y € H, we have 


(oe) 


Tay) = VO eile) fj. 


i,j=l 


where (-,-); denotes the inner product of #1; and where x is given by (26.29). It is 
easily seen that [(x) is a well-defined bounded linear operator 71, —> H. Hence, 
I(x)*I(x) is a bounded linear operator 7; —> H, which is of trace class since 


CO 


Tr, I)"1X) = D> U@ei,Toei)2 = > leijl? = Ix. 


i=1 ij=l 
Thus, /(x) is a Hilbert—Schmidt operator with norm 
IZ lla = +y Tra, A)*1(X)) = [Il (26.31) 
Since /(x)*J(x) is a positive trace class operator on H, it is of the form 


CO lo,e) 
IG) 1G) = Ahn, Ye lel", (26.32) 
n=1 


n=1 


26.5 Some Applications in Quantum Mechanics 387 
where Pz, = |8n)(8n| is the orthogonal projector onto the subspace spanned by the 
element g, of an orthonormal basis g,, n € N, of H,. 

This spectral representation allows easily to calculate the square root of the 
operator I(x)*I(x): 


1) S +VICYIE) = DO Van Pee (26.33) 


n=1 


This prepares for the polar decomposition (see, for instance, [3]) of the operator 
I(x): Hy, —> Ho, according to which this operator can be written as 


I(x) = U|I(x)|, U = partial isometry H; —> Ho, (26.34) 


ie., U is an isometry from (ker I(x))+ C H, onto ran I(x) C Ho. 

Finally denote by h,,n € N, the orthonormal system obtained from the basis 
g,,n € N, under this partial isometry, h, = Ug,. Hence, from (26.33) and (26.34) 
we get 


Te) = Do Van Vin) (Bal: 
n=1 


If we identify p, with /X, and if we apply J~! to this identity, then Eq (26.30) 
follows. 


Remark 26.5 A (vector) state x of a composite system with Hilbert space H = 
H ® Hz is called entangled if its Schmidt decomposition (26.30) contains more 
than one term. Otherwise such a state is called separable or a product state, and 
thus is of the forme @ f,e € Hy, f € Ho. 


26.5 Some Applications in Quantum Mechanics 


Remark 26.6 In the case of concrete Hilbert spaces, the trace can often be evaluated 
explicitly without much effort, usually easier than for instance the operator norm. 
Consider the Hilbert-Schmidt integral operator K in L*(R”) discussed earlier. It is 
defined in terms of a kernel k € L?(IR” x R") by 


Kut) = f keywordy Yar € L70R, 
R" 
In the Exercises, we show that 
Tr(K*K)= if i k(x, y)k(x, y) dxdy. 


A special class of trace class operators is of great importance for quantum mechanics, 
which we briefly mention. 


388 26 Hilbert-Schmidt and Trace Class Operators 


Definition 26.2 A density matrix or statistical operator W on a separable Hilbert 
space H is a trace class operator which is symmetric (W* = W), positive ((x, Wx) > 
0 for all x € H), and normalized (TrW = 1). 

Note that in a complex Hilbert space symmetry is implied by positivity. In quan- 
tum mechanics density matrices are usually denoted by p. Density matrices can be 
characterized explicitly. 


Theorem 26.9 A bounded linear operator W ona separable Hilbert space H is a 
density matrix if, and only if, there are a sequence of nonnegative numbers p, > 0 
with aur Pn = 1 and an orthonormal basis {e, :n € N} of H such that for all 
x EH, 


[oe] 
Wa = Y 1 bn l€ns ens (26.35) 


n=1 
Le, W= ier PnPe,, Pe, =projector onto the subspace K ey. 


Proof Using the spectral representation (26.3) of trace class operators the proof is 
straight forward and is left as an exercise. 


The results of this chapter have important applications in quantum mechanics, but 
also in other areas. We mention, respectively sketch, some of these applications 
briefly. 

We begin with a reminder of some of the basic principles of quantum mechanics 
(see, for instance, [4, 5]). 


1. The states of a quantum mechanical system are described in terms of density 
matrices on a separable complex Hilbert space H. 

2. The observables of the systems are represented by self-adjoint operators in H. 

3. The mean value or expectation value of an observable a in a state z is equal to the 
expectation value E(A, W) of the corresponding operators in H; if the self-adjoint 
operator A represents the observable a and the density matrix W represents the 
state z, this means that 


m(a, z) = E(A, W) = Tr(AW). 


Naturally, the mean value m(a, z) is considered as the mean value of the results 
of a measurement procedure. Here we have to assume that AW is a trace class 
operator, reflecting the fact that not all observables can be measured in all states. 

4. Examples of density matrices W are projectors P, on H, e € H, |le|| = 1, ie., 
Wx = (e,x)e. Such states are called vector states and e the representing vector. 
Then clearly E(A, P.) = (e, Ae) = Tr(P. A). 

5. Convex combinations of states, i.e., i=! A; Wj of states W; are again states (here 
A; = O for all j and Le Aj; = 1). Those states which cannot be represented as 
nontrivial convex combinations of other states are called extremal or pure states. 
Under quite general conditions one can prove: There are extremal states and the 
set of all convex combinations of pure states is dense in the space of all states 
(Theorem of Krein—Milman, [6], not discussed here). 


26.5 Some Applications in Quantum Mechanics 389 


Thus we learn, that and how, projectors and density matrices enter in quantum 
mechanics. 

Next we discuss a basic application of Stone’s Theorem 23.2 on groups of 
unitary operators. As we had argued earlier, the Hilbert space of an elementary 
localizable particle in one dimension is the separable Hilbert space L7(IR). The 
translation of elements f € L?(R) is described by the unitary operators U(a), 
aeéeR: U@P)x) = falx) = f(x — a). It is not difficult to show that this 
one-parameter group of unitary operators acts strongly continuous on L?(R): One 
shows limg—-o || fa — f ||z = 0. Now Stone’s theorem applies. It says that this group 
is generated by a self-adjoint operator P which is defined on the domain 


D= {/ € L?(R): lim, Mf — f) exists in 12) 
by 
pee lim LEH VfeD. 
la—0a 


The domain D is known to be D = W!(R) = {f rae Beg 1 ee uk = L’(R)} and clearly 
Pf=-iff=- iff. This operator P represents the momentum of the particle 
which is consistent with the fact that P generates the translations: 


UaysSe7 


As an illustration of the use of trace class operators and the trace functional we discuss 
a general form of the Heisenberg uncertainty principle. Given a density matrix W 
on a separable Hilbert space H, introduce the set 


Ow = {A € BIH): A*AW € B(H)} 
and a functional on Ow x Ow, 
(A, B) & (A, B)w = Tr(A* BW). 


One shows (see Exercises) that this is a sesquilinear form on Ow which is positive 
semi-definite ((A, A) w > 0), hence the Cauchy—Schwarz inequality applies, i.e., 


(A, B)wl < V(A, A)wV(B, B)w = VA,B € Ow. 


Now consider two self-adjoint operators such that all the operators AAW, BBW, AW, 
BW, ABW, BAW are of trace class. Then the following quantities are well defined: 


A=A-(A)wl, B=B-(B)wl 


and then 


Aw(A) = /Tr(AAW) = ,/Tr(A2W) — (A)2,, 


390 26 Hilbert-Schmidt and Trace Class Operators 


Aw(B) = y TrBBW) = \/ Tr(B2W) — (B)%,. 


The quantity Ayw(A) is called the uncertainty of the observable ‘A’ in the state ‘W’. 
Next calculate the expectation value of the commutator [A, B] = AB — BA. One 
finds 


Tr([A, B]W) = Tr({A, B]W) = Tr(ABW) — Tr(BAW) = (A, B)w — (B, A) w 


and by the above inequality this expectation value is bounded by the product of the 
uncertainties: 


|Tr([A, B]W)| < |(A, B)w + |(B, A) wl < Aw(A)Aw(B) + Aw(B)Aw(A). 


Usually this estimate of the expectation value of the commutator in terms of the 
uncertainties is written as 


1 
ql xtLA, BIW)! S Aw(4)Aw(B) 


and called the Heisenberg uncertainty relations (for the ‘observables’ A, B). 

Actually in quantum mechanics many observables are represented by unbounded 
self-adjoint operators. Then the above calculations do not apply directly and thus typ- 
ically they are not done for a general density matrix as above but for pure states only. 
Originally they were formulated by Heisenberg for the observables of position and 
momentum, represented by the self-adjoint operators Q and P with the commutator 
[Q, P] Cil and thus on suitable pure states y the famous version 


1 
5 < Ay(Q)Ay(P) 


of these uncertainty relations follows. 


26.6 Exercises 


1. Using Theorem 26.3, determine the form of the adjoint of a trace class operator 
A on H explicitly. 
2. For a Hilbert-Schmidt operator K with kernel k € L?(R” x R”), show that 


Tr(K*K) = | i k(x, y)k(x, y) dxdy. 


3. Prove that statements e.—g. in the second part of Theorem 26.6 are equivalent to 
the corresponding statements b.—d.. 
4. Show: If A; € B\(H), j = 1,2,..., N then 


Tr (A, A2-++ Ay) = Tr (A2--+ Ay Ai) = Tr (AwA1A2--+ An-1) 


and in general no further permutations are allowed. 


References 391 


. Prove the characterization (26.35) of a density matrix W. 


Hints: One can use W* = W = |W| = /W*W and the explicit representation 
of the adjoint of a trace class operator (see the previous problem). 


. Show: A density matrix W ona Hilbert space H represents a vector state, i.e., can 


be written as the projector Py onto the subspace generated by a vector w € H 
if, and only if, W2=W. 


. Show: If a bounded linear operator A has the representation (26.5), then its 


absolute value is given by 


JAlx = Do lAnl(@nsx)en, x € HL. 
n 


. Prove that (26.17) defines a continuous linear functional on B(H), under the 


assumption stated with this formula. 


. Prove Formula (26.18). 
. Prove: The partial trace Tr3,, maps density matrices on H; ® Hz to density 


matrices on 1. 


References 


. Blanchard Ph, Briining E. Mathematical methods in physics—distributions, Hilbert space oper- 


ators and variational methods. vol. 26 of Progress in Mathematical Physics. Boston: Birkhauser; 
2003. 

Blanchard Ph, Briining E. Reply to “Comment on ‘Remarks on the structure of states of 
composite quantum systems and envariance’ [Phys. Lett. A 355 (2006)180]”. Phys Lett A. 
2011;375:1163—5. ([Phys. Lett. A 375 (2011) 1160]). 

Davies EB. Linear Operator and their Sprectra. vol. 106 of Cambridge studies in advanced 
mathematics. Cambridge: Cambridge University Press; 2007. 

Haroche S, Raimond JM. Exploring the quantum: atoms, cavities, and photons. Oxford Graduate 
Texts. Oxford: Oxford University Press; 2006. 

Jauch JM. Foundations of quantum mechanics. Reading: Addison-Wesley; 1973. 

Isham CJ. Lectures on quantum theory: mathematical and structural foundations. London: 
Imperial College Press; 1995. 

Rudin W. Functional analysis. New York: McGraw Hill; 1973. 

Takesaki M. Theory of operator algebras I. Vol. 124 of encyclopedia of mathematical sciences— 
operator algebras and non-commutative geometry. Berlin: Springer; 2002. 


Chapter 27 
The Spectral Theorem 


Recall: Every symmetric N x N matrix A (i.e., every symmetric operator A in the 
Hilbert space C’) can be transformed to diagonal form, that is there are real numbers 


Ai,-.-,Ay and an orthonormal system {e),... ,ey} in C% such that Ae, = Acer, 
k = 1,...,N. If P, denotes the projector onto the subspace C e, spanned by the 
eigenvector e;, we can represent the operator A in the form 
N 
A= SoA. 
k=1 
In this case, the spectrum of the operator A is 0(A) = {A1,... , Ay} where we use the 


convention that eigenvalues of multiplicity larger than one are repeated according to 
their multiplicity. Thus, we can rewrite the above representation of the operator A as 


A= > Py, (27.1) 


de0(A) 


where P, is the projector onto the subspace spanned by the eigenvector corresponding 
to the eigenvalue 4. € o(A). The representation (27.1) is the simplest example of the 
spectral representation of a self-adjoint operator. 

We had encountered this spectral representation also for self-adjoint operators in 
an infinite dimensional Hilbert space, namely for the operator A defined in Eq. (24.9) 
for real s;, j € N. There we determined the spectrum as o(A) = {s ji JeN \. In 
this case too the representation (24.9) of the operator A can be written in the form 
(27.1). 

Clearly the characteristic feature of these two examples is that their spectrum 
consists of a finite or a countable number of eigenvalues. However, we have learned 
that there are examples of self-adjoint operators which have not only eigenvalues but 
also a continuous spectrum (see the second example in Sect. 24.2). Accordingly, the 
general form of a spectral representation of self-adjoint operators must also include 
the possibility of a continuous spectrum and therefore one would expect that the 
general form of a spectral representation is something like 


a= f rdPy. (7.2) 
o(A) 


© Springer International Publishing Switzerland 2015 393 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_27 


394 27 The Spectral Theorem 


It is the goal of this chapter to give a precise meaning to this formula and to prove it 
for arbitrary self-adjoint operators in a separable Hilbert space. That such a spectral 
representation is possible and how this representation has to be understood was shown 
in 1928 by J. von Neumann. Later several different proofs of this “spectral theorem” 
were given. We present a version of the proof which is not necessarily the shortest one 
but which only uses intrinsic Hilbert space arguments. Moreover, this approach has 
the additional advantage of giving another important result automatically, namely this 
proof allows us to determine the “maximal self-adjoint part” of any closed symmetric 
operator. Furthermore, it gives a concrete definition of the projectors P, as projectors 
onto subspaces which are defined explicitly in terms of the given operator. This proof 
is due to Lengyel and Stone for the case of bounded self-adjoint operators (1936). It 
was extended to the general case by Leinfelder in 1979 [1]. 

The starting point of this proof is the so-called “geometric characterization of 
self-adjointness.” It is developed in the first section. The second section will answer 
the following questions: What does dP, mean and what type of integration is used 
in formula (27.2)? Finally, using some approximation procedure and the results of 
the preceding sections, the proof of the spectral theorem and some other conclusions 
are given in the third section. 


27.1 Geometric Characterization of Self-Adjointness 


27.1.1 Preliminaries 


Lemma 27.1 Suppose A is a closed symmetric operator with domain D, in a 
complex Hilbert space H, and (Py)nen a Sequence of orthogonal projectors with the 


following properties: 
a) Py < Posi, b) ran P, C D, c) AP, = P, AP, foralinéN, 
and d) imy-+o Pyx = x for all x € H. Then 
D= {x €H: (AP, X)nen converges in H} (27.3) 
= {x €H: (||AP,x| nen converges in R} (27.4) 
and for all x € D, 
Ax = lim AP,x in H. (27.5) 
noo 


Proof Condition a) and Lemma 23.1 imply P,P, = P, for all m > n. Therefore, 
by an elementary calculation using condition c) and the symmetry of A, one finds 


(APnx,AP,x) = (AP, x, AP, x) = (AP, x, APnx) Vm>n,Vx EH, 
and similarly 


(Ax, AP,) = (AP,x, APy) = (AP,x, Ax) Vn, Vx € D. 


27.1 Geometric Characterization of Self-Adjointness 395 


Evaluating the norm in terms of the inner product gives 
|APnx — AP,x||? = ||APnx||? — ||AP.x|l? Vm>n,VxEH (27.6) 
and 
| Ax — AP, x||? = || Axl? — | A Pix Vn, Vx € D. Gry) 


Equation (27.6) shows that the sequence (A P,x),<cn converges in H if, and only if, 
the sequence ( || A P,x|| nen converges in R. Thus, the two domains (27.3) and (27.4) 
are the same. 

Now assume x € H. and (AP,,x)n,en converges. Condition d) and the fact that A 
is closed imply x € D and Ax = lim,_,.. AP, x. Hence, x belongs to the set (27.3). 

Conversely assume x € D. The identities (27.6) and (27.7) imply that 
(||A Px || nen iS a Monotone increasing sequence which is bounded by ||Ax||. We 
conclude that this sequence converges and the above characterization of the domain 
D of A is established. 

The identity (27.5) results from the characterization (27.3) of the domain and the 
fact that A is closed. 


Lemma 27.2 Suppose H is a finite dimensional Hilbert space and F is a linear 
subspace of H. For a given symmetric operator A on H define the function f(x) = 
(x, Ax) and denote yp = inf { f(x): x € F, ||x|| = 1}. Then wp is an eigenvalue of A 
and the corresponding eigenvector €o satisfies f (ey) = [L. 


Proof f isacontinuously differentiable function on H with derivative f’(x) = 2Ax, 
since A is symmetric (see Chap. 30). The set {x € F : |x| = 1} is compact and hence 
f attains its minimum on this set. This means that there is e9 € F, |leo|| = 1 such 
that f(eo) = uw > —oOO. 

This is a minimization problem with constraint, namely to minimize the values 
of the function f on the subspace F under the constraint g(x) = (x,x) = 1. The 
theorem about the existence of a Lagrange multiplicator (see Theorem 35.2) implies: 
There is anr € R such that f’(e9) = rg’(eo). The derivative of g is g/(x) = 2x, 
hence f’(e9) = reo and therefore f(e9) = (e9, Aen) = r (eo, eo) = r. The Lagrange 
multiplier 7 is equal to the minimum and we conclude. 


27.1.2 Subspaces of Controlled Growth 


Given a closed symmetric operator A in an infinite dimensional Hilbert space H, we 
introduce and characterize a certain family of subspaces on which the operator A 
grows in a way determined by the characteristic parameter of the subspace. To begin, 
we introduce the subspace of those elements on which any power of the operator can 
be applied 


D® = D®(A) = MnenD(A"). (27.8) 


396 27 The Spectral Theorem 


This means A”x € H for all x € D™. For every r > 0 define a function g, : D° > 
[0, co] by 


r(x) = sup r~ || A”x| Vx ED. (27.9) 
neN 


This function has the following properties: 
Gr(a@x) = |alg(x), g(xty<a(x)+a(y) Vx,yeD™, VaeC. 
Therefore, for every r > 0, the set 
G(A,r) = {x E~: g(x) < co} 


is a linear subspace of H. For r = 0 we use G(A,0) = N(A). Next the subsets of 
controlled growth are introduced. For r > 0 denote 


F(A,r) = {x € D®: | A"x|] <r" Ilxll, ¥n e N}. (27.10) 


The most important properties of these sets are described in 


Lemma 27.3 For a closed symmetric operator A in the complex Hilbert space H 
the subsets F(A,r) and G(A,r) are actually closed subspaces of H which satisfy 


a) F(A,r) = G(A,r) forallr > 0 

b) AF(A,r) © F(A,r) 

c) If B is a bounded operator on H which commutes with A in the sense that 
BA C AB, then, for allr > 0 


BF(A,r) C F(A,r), —-B* F(A,r)+ C F(A,r)*. (27.11) 


Proof From the definition of these sets the following is evident: G(A, r) is a linear 
subspace which contains F(A,7r). The set F(A, 7) is invariant under scalar multipli- 
cation but it is not evident that the sum of two of its elements again belongs to it. 
Therefore, in a first step, we show the equality of these two sets. 

Suppose that there is a z € G(A,r) which does not belong to F(A,7r). We can 
assume ||z|| = 1. Then there is some m € N such that ||A”z|| > r” ||z||. Introduce 
the auxiliary operator S = r~’” A”. It is again symmetric and satisfies ||Sz|| > 1. For 


every j € N we estimate | Siz|? = (z, S%4z) < ||S?/z||, hence | s2| > || Szll > 


oo as j — ov, but this contradicts z € G(A,7r), Le., 


S| < qr(z) < o. Hence 
both sets are equal and part a) holds. 

For x € G(A,7r) the obvious estimate g,(Ax) < rq,(x) holds and it implies that 
G(A,r) = F(A,r) is invariant under the operator A, thus part b) holds. 

Next, we prove that this subspace is closed. Given yp € F(A, r) there is asequence 
(X%n)nen C F(A,r) such that yo = limp Xn. Since F(A, r) is alinear subspace, x,— 
Xm € F(A,r) for all n,m € N and therefore || A/x, — A/Xm|| = || An — Xm)]| < 
r! ||\Xn — Xm|| for every 7 € N. This shows that (A/x,)nen 1s a Cauchy sequence 


27.1 Geometric Characterization of Self-Adjointness 397 


in H for every j. Therefore, these sequences have a limit in the Hilbert space: 
Vp Sti AT J =H OA es 

Now observe (%,)nen C F(A,r) C D(A) and the operator A is closed. Therefore, 
the identities yj; = lim, A! xp for j = 0 and j = | imply: yo € D(A) and 
y; = Ayo. Because of part b) we know F(A,r) to be invariant under A, hence 
A/x, € F(A,r) for all n, j € N. Hence, a proof of induction with respect to j 
applies and proves 


y; € D(A), yj = A’yo, j =0,1,2,.... 
We deduce yo € D™ and 


|| A/yol] = lim, || A’x, || < lim supr? |[xnll =r Ilyoll Vj EN. 
n> n—->oo 
It follows that yp € F(A,r) and this subspace is closed. 

If B is abounded operator on H which commutes with A in the sense of BA C AB 
we know A” Bx = BA"x for all x € G(A,r) and all n € N and therefore g,(Bx) < 
|B| q-(x) for every r > 0, hence BG(A,r) C G(A,r) for r > 0. For r = 0 the 
subspace G(A, 0) is by definition the null space of A which is invariant under B 
because of the assumed commutativity with A, therefore BG(A,r) C G(A,r) for 
allr > 0. 

Finally suppose x € G(A,r)+. For all y € G(A,r) we find (B*x, y) = (x, By) = 
0, since BG(A,r) € G(A,r), and therefore B*x € G(A,r)+. This proves part 
Cc). 
The restriction of the operator A to the closed subspace F(A,r) is bounded by r, 
|| Ax|| < r ||x|| for all x € F(A,r). Hence the family of closed subspaces F(A,r), 
r > O controls the growth of the operator A. It does so actually rather precisely 
since there are also lower bounds characterized by this family as we are going to 
show. These lower bounds are deduced in two steps: First they are shown for finite 
dimensional subspaces. Then an approximation lemma controls the general case. 


Lemma 27.4 Fora symmetric operator A in a finite dimensional complex Hilbert 
space H one has for allr > 0 and all x € F(A,r)+, x £0: 


Axl] >rllxl} and (x,Ax) >ri|jx|? if A>O. 


Proof The proof for the general case will be reduced to that of a positive operator. 
So we start with the case A > 0. Denote S; = {x € H: ||x|| = 1} and consider the 
function f(x) = (x, Ax). Since 8; N F (A,r)+ is compact f attains its minimum 
we = inf { f(x) : x € S,M F(A,r)*} on this set, i.e., there is an eg € S$; F(A,r)~ 
such that f(e9) = uw. By Lemma 27.2 the minimum vw is an eigenvalue of A and 
€o is the corresponding eigenvector: Aeg = eg. This proves eg € F(A, pt). If we 
had w <r, then F(A,) C F(A,r) and thus eg € F(A,r) F(A,r)* = {0}, a 
contradiction since ||eo|| = 1. Hence, the minimum must be larger than r: yx > r. 

The lower bound is now obvious: For x € F(A,r)+, x 4 0, write (x, Ax) = 
|x|? (y, Ay) with y € S,M F(A,r)+, thus (x, Ax) > ||x||? w > r ||x||? which is 
indeed the lower bound of A for A > 0. 


398 27 The Spectral Theorem 


Since A is symmetric it leaves the subspaces F(A,r) and F (A,r)+ invariant. 
It follows that the restriction B = Ajp,,4,)1 is a symmetric operator F(A, ryt > 
F(A,r)+ which satisfies ||Ax||> = (x, B?x) for all x € F(A,r)+. As above we 
conclude that 


mw = inf {||Ax|l? x € S(r)} = inf {(x, Bx) sx € S(r)} 


is an eigenvalue of B? (we use the abbreviation S(r) = S,; N F(A,r)+). Elementary 
rules for determinants say 


0 = det (B* — 7) = det (B — pI) det (B + I) 


and therefore either +z or —y is an eigenvalue of B. As above we prove |u| > r 
and for x € F(A,r)+, x 4 0, write ||Ax|| = ||x|| || Ay|| with y € S(r) and thus 
Axl] = lel] lvl > r lel. 


Lemma 27.5 Let A be a closed symmetric operator in a complex Hilbert space 
H. Introduce the closed subspace of controlled growth F(A,r) as above and choose 
any 0 <r <s. Then, for every givenx € F(A,s)N F(A,r)+ there are a sequence 
(An)nen Of finite dimensional subspaces of H, a sequence of symmetric operators 
An : Hy, — Hy, and a sequence of vectors X, € Hy, n € N, such that 


Xn € F(An,S)OF(A,,ry = Wn EN, 


lim ||x — x,|| = 0 = lim ||Ax — A,x,|| . 
n—-> oo noo 


Proof According to Lemma 27.3 the subspaces F(A, s) and F(A, ry" are invariant 
under the symmetric operator A. Therefore, given x € F(A,s)M F(A, ryt Cc D®, 
we know that 


A, = Hilt) = BAX. Ae) EFA ONEA W)C D™, 


Clearly the dimension of H, is smaller than or equal to n + 1. From Lemma 27.3, 
we also deduce 
A, Cc An+1 Cc ie key _ Unen An Cc F(A,s) a F(A,r)*. 

Introduce the orthogonal projectors P, onto H, and P onto H and observe 
limy+oo Phy = Py for every y € H. Next, we define the reductions of the op- 
erator A to these subspaces: A, = (P,AP,)ji4,. It follows that A, is a symmetric 
operator on H/,, and if A > 0 is positive so is Ay. 

We prepare the proof of the approximation by an important convergence property 
of the reduced operators A,: 


lim (P,AP,)/y =(PAP)'y VjEN, VyeH. (27.12) 


no 


27.1 Geometric Characterization of Self-Adjointness 399 


Equation (27.12) is shown by induction with respect to j. Since H,, C F(A,s) © 
D® we know Py € D® and thus 

< ||(P — P,)APyl|| + ||ACPy — Pry)Il 

< |\(P — P,)APyll + 5 |l(Py — Pay)ll. 


Since ||(P — P,,)z|| > O0asn — ov, Eq. (27.12) holds for 7 = 1. Now suppose that 
Eq. (27.12) holds for all j < k for some k > 1. Then we estimate as follows: 


|| (PAP)t"y — (P,AP»)‘*'y| 

= ||(P,AP,)[(PAP)* — (P,AP 2)‘ ly + (PAP — P,AP,)(PAP)*y | 

< (PAP, (PAP) — (P,AP,)‘1y|| + ||(PAP — PAP, )(PAP)* | 
<5 ||((PAP)* — (P,AP»)‘ly|] + (PAP — P,AP,)(PAP)*y]| . 


As n — oo the upper bound in this estimate converges to zero, because of our 
induction hypothesis. Therefore, Eq. (27.12) follows for all /. 


After these preparations the main construction of the approximations can be done. 
Since H,, is invariant under the operator A, Eq. (27.12) implies for all y € Hy, 


lim (P, AP,)/y =(PAP)/y=Aly WEN. (27.13) 
noo 


The givenx € F(A, s)NF(A, r)* satisfies x € H, foralln € N. Thus, wecan project 
it onto the subspaces F(A,,r) = F(A,r) A H,, and their orthogonal complement: 


X=Xn@Yn, Xn € F(Anry', yn € F(Anr), Wn eN. 


Since ||x||? = |[xnll? + Ilya lI the sequence (y,),en contains a weakly convergent 
subsequence (yn(i))icw With a limit denoted by y. Since all elements of the subse- 
quence belong to the space H.. which is strongly closed and thus weakly closed, this 
weak limit y belongs to Hx» C F(A,s)M F(A,r)*. We are going to show y = 0 by 
showing that this weak limit y also belongs to F(A,r). 

For any k € N, Eq. (27.13) implies 


. 2 = : 
| At yo = (y, A*y) = lim (y, (Pa A Pay) = lim (y, (Pa Pat)” Yn) 


since (PriyA Pay y converges strongly to A**y and Yn) Weakly to y. We can estimate 
now as follows, using yn) € F(Anc),1): 


| A‘y ||? < tim sup [ly ll | (Pray A Paciy)“Ynciy| < tim sup [ly ll r* |] ynay || < 77 Uy Mell, 
i-oo i>0o 


hence y € F(A,r), and we conclude y = 0. 


400 27 The Spectral Theorem 


Finally, we can establish the statements of the lemma for the sequence (Xn(i))ieN 
corresponding to the weakly convergent subsequence (y,(i))iceN. For simplicity of notation 
we denote these sequences (X,),en, respectively (y,)nen. The elements x, have been 
defined as the projections onto F(A,, r)t C Hy, C F(A,s). Hence, the first part of the 
statement follows, since H, 0 F(A,s) = F(Ay,5). Note |lx — x,l[? = (x Xun) = 
(x, Yn) and recall that the sequence (y,,),<cn converges weakly to zero, thus ||x — x, |" > 0 
as n — oo. According to the construction of the spaces H,,, the elements x, Ax are 
contained in them, thus the identity Ax = P, AP,x holds automatically. This gives the 
estimate 


||Ax — ApXn|l = Anx — AnXnl|l = I|CPrA P(e — xn)Il <8 [lx — xnll, 


and the approximation for the operator A follows. 
The combination of the two last lemmas allows us to control the growth of the 
operator A on the family of subspaces F(A,r), r > 0. 


Theorem 27.1 Let A be aclosed symmetric operator on the complex Hilbert space 
H. and introduce the family of subspaces F(A,r), r = 0 according to Eq. (27.10). 
Choose any two numbers 0 < r < s. Then for every x € F(A,s) 1M F(A, ry the 
following estimates hold: 


r|ixl| < Ax] <sllx]| andr |x|? < (x, Ax) <s|[x|? if A>0. (27.14) 


Proof If x € F(A,s)Q F(A,r)+, approximate it according to Lemma 27.5 by 
elements x, € F(An,5)M F(A,,r) and the operator A by symmetric operators A,, 
in the finite dimensional Hilbert space H,,. Now apply Lemma 27.4 to get, for all 
neéN, 


ral SWAnxall andr [lxnll? < GnAntn) if A> 0. 
To conclude, take the limit 7 — oo in these estimates which is possible by Lemma 
2: 
The family of subspaces F(A,r), r > 0, thus controls the growth of the operator A 
with considerable accuracy (choose r < s close to each other). This family can also 
be used to decide whether the operator A is self-adjoint. 


Theorem 27.2 (Geometric Characterization of Self-Adjointness) A closed sym- 
metric operator A in a complex Hilbert space H is self-adjoint if, and only 


if 
U F(A,n) 


neN 


is dense in H. Here the subspaces of controlled growth F(A,n) are defined in 
Eq. (27.10). 


Proof According to Lemma 27.3 the closed subspaces F(A, n) satisfy F(A,n) C 
F(A,n+1) forall € N, hence their union is a linear subspace too. Denote by P,, the 
orthogonal projector onto F(A, 7). It follows that (P,,),en is a monotone increasing 


27.1 Geometric Characterization of Self-Adjointness 401 


family of projectors on H. Thus, if L),,-., F(A, 1) is assumed to be dense in 1 this 
sequence of projectors converges strongly to the identity operator J. In order to show 
that the closed symmetric operator A is self-adjoint it suffices to show that the domain 
D(A*) of the adjoint A* is contained in the domain D(A) of the operator A. 

Consider any x € D(A%*). Since P, projects onto F(A,n) C D(A) Cc D(A*), 
we can write A*x = A*(x — P,x) + A*P,x = A*U — P,)x + AP,x. Since the 
subspace F(A, 7) is invariant under A and since J — P, projects onto F(A,n)+, one 
has (A*(I — P,)x, AP,x) = (I — P,)x, A? Pax) = 0. This implies 


| A*x |? = | Ata — Pal) + IAP xI?. 


Therefore, the sequence (A P,,x),cn is norm bounded, and thus there is a weakly 
convergent subsequence (A P,(j)X)jen. Since (Pii)X)ien 18 weakly convergent too 
and since an operator is closed if, and only if, it is weakly closed, we conclude that 
the weak limit x of the sequence (Pi)x)iew belongs to the domain D(A) of A and 
the sequence (A P,(i)x)ien converges weakly to Ax. This proves D(A*) C D(A) and 
thus self-adjointness of A. 

Conversely assume that the operator A is self-adjoint. We assume in addition that 
A > I. In this case, the proof is technically much simpler. At the end we comment 
on the necessary changes for the general case which uses the same basic ideas. As 
we know the space U, en F(A, 1) is dense in H if, and only if, 


(Unen F(A,1))" = nenF(A,n)~ = {0}. 


The assumption A > J implies that A~! is a bounded self-adjoint operator H — 
D(A) which commutes with A. Form the spaces F(A7!,r), r > 0. Lemma 27.3 
implies that A~' maps the closed subspace H, = F(A~!,r~')+ into itself. Hence, 
B, = (A~!))y, isa well-defined bounded linear operator on H,. Theorem 27.1 applies 
to the symmetric operator B,. Therefore, for all x € F(A7!,r7!)+ 9 F(A7},5), 
5 = ||B,||, the lower bound || B,x|| = | A-'x| > i ||x || is available. We conclude that 
B,: H, — H, isbijective. Hence for every xo € H, there is exactly onex, € H, such 
that x9 = Bx; = A~!x,. This implies x9 € D(A) and x; = Axo € H,. Iteration of 
this argument produces a sequence x, = A”x9 € H,ND(A) = F(A7!,r7!)+ND(A), 
n € N.This implies x9 € D® and ||xnl| = || AT! xn41] = 77 Wxngill =r! Axl, 
hence || Ax, || <r” ||xo|| for alln € N, or x9 € F(A,r) and thus 


F(A 1 ro)" CF(Ar) Wr>0. 
This holds in particular for r = n € N, hence 
1 
~ Cc a= = 1 — 
(\F(Any Cf )F(A »—) =N (A) = {0}. 
neN neN 


This concludes the proof for the case A > [. 
Now we comment on the proof for the general case. For a self-adjoint operator 
A the resolvent R4(z) = (A — zI)~! : H > D(A) is well defined for all z € C\R. 


402 27 The Spectral Theorem 


Clearly, the resolvent commutes with A and is injective. In the argument given 
above replace the operator A~! by the operator B = Ra(z)*Ra(z) = Ra@)Ra(z). 
This allows us to show, for all r > 0, 


1 
F(B,r)* © F(A, |z| + ~ WRa@ll)- 


Now, forn > |z| denoter, = el || R4(z)||, then F(B,r,)+ C F(A,n) and therefore 


n—|e| 


(| FtA.n)* C 0) FB, rn) = N(B) = {0}, 


n>|z| n>|z| 


and we conclude as in the case A > [. 


27.2 Spectral Families and Their Integrals 


In Proposition 23.1, we learned that there is a one-to-one correspondence between 
closed subspaces of a Hilbert space and orthogonal projections. In the previous 
section, the family of subspaces of controlled growth were introduced for a closed 
symmetric operator A. Thus, we have a corresponding family of orthogonal projec- 
tions on the Hilbert space which will finally lead to the spectral representation of 
self-adjoint operators. Before this can be done the basic theory of such families of 
projectors and their integrals have to be studied. 


27.2.1 Spectral Families 


The correspondence between a family of closed subspaces of a complex Hilbert space 
and the family of projectors onto these subspaces is investigated in this section in 
some detail. Our starting point is 


Definition 27.1 Let H be a complex Hilbert space and E a function on R with 
values in the space $8(H) of all orthogonal projection operators on H. E is called 
a spectral family on H or resolution of the identity if, and only if, the following 
conditions are satisfied. 


a) E is monotone: E,E, = E;,;5 for all t,s € R, where t A s = min {t, 5} 
b) E is right continuous with respect to the strong topology, i.e., 
limy 41,551 |Esx — E,x|| = 0 for allx e¢ H andallt eR 
c) E is normalized, i.e., lim;.-.. E;x = 0 and lim;_,4. E;x = x for every x € H. 


The support of a spectral family E is supp E = {t eR: E, 40, E, £ J}. 


27.2 Spectral Families and Their Integrals 403 


Given a spectral family E on H we get a family of closed subspaces H, of H by 
defining 
AH, = ran E,, VteR. 


In the following proposition the defining properties a)—(c) of a spectral family are 
translated into properties of the family of associated closed subspaces. 


Proposition 27.1 Let {E;},<R be a spectral family on H. Then the family of closed 
subspaces H, = ran E, has the following properties: 


a) Monotonicity: H, C H; foralls <t 
b) Right continuouity: Hs = iss; 
c) Normalization: Nicer H; = {0} and U;erH; = H 


Conversely, given a family of closed subspaces H,, t € R, of H with the properties 
a)—c) then the family of orthogonal projectors E; onto H,, t € R, is a spectral family 
on H. 


Proof The monotonicity condition a) for the spectral family is easily translated into 
that of the family of ranges H, by Lemma 23.1. This implies H, C H, for alls < t 
and therefore H, C M,.,H,. For any x € M,<;H; we know E,x = x for alls < tf, 
hence x = limy.5.s<; E;x = Esx, ie., x € ran E, = H, since a spectral family is 
right continuous. This proves b) for the family H,,t € R. 

The normalization for the spectral family lim,_,.. E;x = x for all x € H implies 
immediately that the closure of the union of all the subspaces H, gives the whole 
Hilbert space. Next consider x € N;erH;, then x = E,x for all t € R and thus 
x = lim;+-. E;x = 0 because of the normalization for the spectral family. This 
proves the normalization for the family of subspaces H,. 

If a family of closed subspaces H;, tf € R, with the properties a)—-c) is given, 
define a family of orthogonal projectors by defining FE, as the orthogonal projector 
onto the subspace H, for all tf ¢ R. Suppose s < f, then H, C H, and Lemma 23.1 
implies EF, = E,E, = E,E, < E, and thus monotonicity of the family of projectors. 
According to Theorem 23.1, a monotone increasing family of projectors has a strong 
limit which is again an orthogonal projector. Hence, for every x € H we know 
lim;.5;+5 E;x = Px for some orthogonal projector P on H. The condition b) for 
the family of subspaces H, implies 


ran P =1,,,ran E; = ;,sran H; = H, = ran E,, 


thus P = E, by part d) of Proposition 23.1. Therefore, the functiont rh E, € BCH) 
is right continuous. 
Since t +» E; is monotone the following strong limits exist (Theorem 23.1): 


lim;.-. FE, = Q~- and lim;.4. EF, = Q+ with ranQ_~ = M,s_.ranE; = 
"t>—co MH, = {0} and ran Q, = U;erran E, = U;er AH; = H and again by Proposi- 
tion 23.1 we conclude Q_ = 0 and Q, = J which are the normalization conditions 


of a spectral family. 


404 27 The Spectral Theorem 


27.2.2 Integration with Respect to a Spectral Family 


Given a spectral family E, on a complex Hilbert space H and a continuous function 
Ff : [a,b] — R, we explain the definition and the properties of the integral of f with 
respect to the spectral family: 


b 
i fOdE;. (27.15) 


The definition of this integral is done in close analogy to the Stieltjes integral. Ac- 
cordingly we strongly recommend studying the construction of the Stieltjes integral 
first. 

There is naturally a close connection of the Stieltjes integral with the integral 
(27.15). Given any x € H define p,(t) = (x, E;x) for all tf € R. Then p, is a 
monotone increasing function of finite total variation and thus a continuous function 
f has a well-defined Stieltjes integral vhs r f(@)dp,(t) with respect to p, and one finds 
according to the definition of the integral (27.15) 


b b 
J seo. = cx f f(t)dE;x). 


For a given spectral family FE, on the complex Hilbert space H and any s < t¢ 
introduce 


E(s,t] = E, — E;. (27.16) 
In the Exercises, we show that E(s, t] is an orthogonal projector on H with range 
ran E(s,t] = H,N Ht = H, O Hs. 
Since a spectral family is not necessarily left continuous, the operator 


P(t)= lim E(s,t]= E, — E;-0 (27.17) 
S>ts<t 


is in general a projector ¢ 0. Indeed, P(t) ¥ O if, and only if, E, is discontinuous at 
t (for the strong topology). If (s,, t;] and (s2, 2] are two disjoint intervals, then 


E(s1, tJ E(s2, 2] = 0. (27.18) 


A partition Z of the interval [a,b] is a decomposition of [a, b] into a finite number 
of disjoint subintervals together with a choice of one point in each subinterval: 


a=ti<t)<-:-<t,=b, t, € (tj-1,¢;], | eee (27.19) 


27.2 Spectral Families and Their Integrals 405 


This is denoted as Z = Z(t;,t’.,n). The number 


jolie 
|Z| = max {|t; ea eo a eee .n} 


is called the width of the partition Z. It is the length of the largest subinterval. Given 
two partitions Z(¢;, ti, n) and Z(s;,8;,m) we can form their union 


Z(tj,t;,n) V Z(si,5;,m) = Z(t, %, P) 


where 7),...,T, iS an enumeration of the points {t),... ,f,,51,.-.,5m} in their 
natural order with the corresponding selection of t € {t. apa cae Obviously the 
width of this union is smaller than or equal to the widths of the original partitions. 
Thus this union is also called the joint refinement of the two partitions. 

For a partition Z = Z(t;, t', n) of the interval [a,b] and a continuous function 
f : [a,b] = R, form the sum 


2) = >) FENEG AI. (27.20) 


j=l 
Relation (27.18) implies E(t;_1,t;]E(G-1,t)] = 6;; E(@j-1, ¢;] and thus forall x ¢ H, 
2 


S- EG-1. te = PSS EG. f = 1E@ ole? < Ile? 27.21) 
j=l 


j=! 


Apply the identity (27.20) to any x € H. Then the orthogonality relation (27.18) 
implies 


NEC Z)xl? = SOF ODP? JEG. te |)? (27.22) 


j=l 
which according to the relation (27.21) leads to the estimate 
|X(f, Z)xll < sup {| FMI]: 1 € [a, b]} ||E(a, b]xll. (27.23) 


This proves that X'(f, Z) is a bounded linear operator on 11. Now we study the limit 
of these bounded operators when the partition Z gets finer and finer, i.e., the limit 
|Z| — 0. Suppose Z = Z(tj,tj,n) and Z’ = Z(s;,s;,m) are two given partitions 
and Z(t;, tj, p) is their joint refinement, then, for any x € H, 


m 


Uf, Z)x — ECF, Z)x =D FGYEG-1, tix — DO FYE, sil 


i=l 


j=l 
Pp 

= So ex E(te-1, Tl, 
k=l 


406 27 The Spectral Theorem 


where e, = +[ f(t; w—-f (t;,)], and because of the orthogonality of the projectors 
E(t,-1, Te], 


Dp 
|ECh. Dx — SF, 2x? = So lee? WEG, tall? 
k=1 


Given any € > 0, there is ad > O such that | f(t) — f(s)| < ¢ whenever |s — t| <6 
since f is uniformly continuous. If the widths of the partitions Z, Z’ are both smaller 
than or equal to 6, the width of their joint refinement is also smaller than or equal to 
6 and thus we can estimate 


lexl = [fad — FDI SIF i) — FC) + Fe) — FG) < 28, 
since |t,,, — T%| < 6 and |t, — %| < 6. As in estimate (27.21) we obtain 
| SCF, Z)x — Cf, Z’)x|| < 2 EG ble (27.24) 


and conclude that the bounded operators »'(f, Z) have a strong limit as |Z| — 0 
and that this limit does not depend on the particular choice of the net of partitions Z 
which is used in its construction. We summarize our results in 


Theorem 27.3 Let E,, t € R, be a spectral family on the complex Hilbert space H 
and [a,b] some finite interval. Then for every continuous function f : [a,b] > R 
the integral of f with respect to the spectral family E,, 


b 
; f@dE; (27.25) 
is well defined by 
b 
f@dk,x = Mae Lif, Z)x. (27.26) 


a 


It is a bounded linear operator on H with the following properties: 


a) If F(GE.x| < sup {If@|:t € [a,b]} ||E(@,blxl| Vx eH 
b) fro [? f(OdE, is linear on C({a, b]; R) 
c) foreverya<c<b: [? f(dE, = f° fdE, + [? fAE; 
d) (fv fdE,)" = I. F@dE, 
o) | FOdEx| = LP LFOPda) 

for all x € H where p,(t) = (x, E;x) = ||E;x||" 


Proof We have shown above that the limit (27.26) exists for every x € H. Taking 
this limit in estimate (27.23) gives Property a). Since X'(f, Z) is a bounded linear 
operator on H. we deduce that i, ‘ Ff (t)dE, is a bounded linear operator. Properties b)— 
d) follow from the corresponding properties of the approximations (f/f, Z) which 
are easy to establish. The details are left as an exercise. 


27.2 Spectral Families and Their Integrals 407 


The starting point for the proof of part e) is Eq. (27.22) and the observation 


|| Ej-1,tj]x |’ = Px(tj) — Px(tj-1), 


which allows one to rewrite this equation as 


IZCF. 2x1? = DoF EP Lex(ts) — x(ti—DI- 
j=l 
Now in the limit |Z| — 0 the identity of part e) follows since the right-hand side is 


just the approximation of the Stieltjes integral i | f(t)|/’dp,(t) for the same partition 
Z. 


Lemma 27.6 Suppose E,, t € R, is a spectral family in the complex Hilbert space 
H and f : [a,b] — Ra continuous function. Then for any s < t the integral 
re f(t)dE; commutes with the projectors E(s,t] and one has 


b b 
E(s, 1] / fUdE, = i fUdE,E(s,t] = / f@dE,. (27.27) 
a a ( 


a,b\N\s,t] 


Proof Since E(s,t] is a continuous linear operator Eq. (27.26) implies 


b 
e.n [ fwd E, = ee E(s,t]X(f, Z), 


where Z denotes a partition of the interval [a, b]. The definition of these approximat- 
ing sums gives E(s,t]X'(f, Z) = E(s,t] ae fO)ECi-1, t;]. Taking the defining 
properties of a spectral family into account we calculate 


E(s, t]E(tj-1,t] = E(tj-1,t)]E(s,t] = E((tj-1,t;] N(s,t]). 


We deduce lim)z).9 E(s, 1] 2'(f, Z) = limjzj0 X'(f, Z)E(s, t] and the first identity 
in Eq. (27.27) is established. 

For the second identity some care has to be taken with regard to the interval to 
which the partitions refer. Therefore, we write this explicitly in the approximating 
sums X'(f,Z) = X'(f, Z,[a,b]) when partitions of the interval [a,b] are used. In 
this way we write 


Xf, Z)E(s,t] = Xf, Z, [a, bE, t] = SS FOPEG-1, 1) EG, 1 


j=l 


=) fQEG 41 NG) = FY. Zao Ge), 


j=l 


408 27 The Spectral Theorem 


where Z’ is the partition induced by the given partition Z on the subinterval [a, b] 9 
(s,t]. Clearly, |Z| — 0 implies |Z’| — 0 and thus 


Jim, 2(f, Z)B(s,f1 = lim (Ff, Z/,[a,b19 (s,t) = [ snpan Se 


and we conclude. 
For the spectral representation of self-adjoint operators and for other problems one 
needs not only integrals over finite intervals but also integrals over the real line R 
which are naturally defined as the limit of integrals over finite intervals [a,b] as 
a — —oo and b > +00: 


lo) b b 
/ f(dE,x = lim f(@dE,x = lim / f(dE,x (27.28) 
—oo Det00 a a,b a 


for all x € H for which this limit exists. The existence of this vector valued integral 
is characterized by the existence of a numerical Stieltjes integral: 


Lemma 27.7 Suppose E,, t € R, is a spectral family in the complex Hilbert space 
Hand f :R — Racontinuous function. For x € H the integral 


/ 7 f(tdE,x 


exists if, and only if, the numerical integral 


i, LP@ld I Ex? 


exists. 


Proof The integral i f(HdE;x has a limit for b + +00 if, and only if, for every 
€ > O there is bg such that for all b' > b > bo, 


2 


<¢", 


i f(tdE,x 
b 


Part e) of Theorem 27.3 implies 
2 


b! b! 
| ; f@dE,x| = J I f(t) dpx(t), 


where dp,(t) = d||E,x||*. Thus, the vector valued integral has a limit for b + oo 

if, and only if, the numerical, i.e., real valued integral does. 
In the same way the limit a — —oo is handled. 

Finally, the integral of a continuous real valued function on the real line with respect 

to a spectral family is defined and its main properties are investigated. 


27.2 Spectral Families and Their Integrals 409 


Theorem 27.4 Let E,, t € R, be a spectral family on the complex Hilbert space H 
and f : R > Racontinuous function. Define 


+00 
D= {x EH: : | f(t)|7d ||E,x||? < oo (27.29) 
—oo 
+00 
= {x EH: f@dE;x exists] (27.30) 
—oo 
and on this domain D define an operator A by 
+00 
Ax = f@dE;x VxeD. (27.31) 
—0o 


Then this operator A is self-adjoint and satisfies 


E(s,tlAC AE(s,t] Vs <t. (27.32) 


Proof According to Lemma 27.7, the two characterizations of the set D are equiv- 
alent. The second characterization and the basic rules of calculation for limits show 
that the set D is a linear subspace of 1. In order to prove that D is dense in the 
Hilbert space we construct a subset Dy C D for which it is easy to show that it is 
dense. 

Denote P, = E, — E_, for n € N and recall the normalization of a spectral 
family: P,x = E,x — E_,x — x—Oasn — oo, for every x € H. This implies that 
Do = UnenPrH is dense in H. Now take any x = P,x € Do for some fixedn € N. 
In order to prove x € D we rely on the second characterization of the space D and 
then have to show that lim, , i f@dE,x exists in H. This is achieved by Lemma 
27.6 and Theorem 27.3: 


b 
tim [ f@dE,x = lim fWdE,E( —n,n]x 
a,b a a,b (a,b] 


= lim f@dE,x = / f@MdE,x. 
ab J(a,b\n(—n,n] (—n,n] 
Since the last integral exists, x = P,,x belongs to the space D. We conclude that A 
is a densely defined linear operator. 
Similarly, for x ¢ D, Lemma 27.6 implies 


b b 


b 
P, Ax = Ph tim f f@dkE,x = lim P,, fWMdE,x = lim fW@dE, Prax = APyx, 


Le., P, A C AP, and thus AP, = P, AP, for all n € N. In the same way we can prove 
relation (27.32). 

For all x, y € D one has, using self-adjointness of ie f(u)dE,, according to part d) 
of Theorem 27.3, 


b b 
(x, Ay) = (stim f f(wdEny) = tim, f(wdEny) 


410 27 The Spectral Theorem 


b b 
=tim( f° Fondz.x,y) = (im f° fandeye.y) = (Any 
hence A Cc A* and A is symmetric. 
In order to prove that A is actually self-adjoint take any element y € D(A*). Then 
y* = A*y € H and A*y = lim, P, A*y. For all x € H we find (P,A*y,x) = 
(A*y, Px) = (y, AP,x) = (y, Pp AP,x) = (P, AP, y,x) where we used P,,x € D, the 
symmetry of A and the relation AP,, = P,, AP, established earlier. It follows that 


P,A*y = P,APhy = APny Vn eN. 


According to the definition of the operator A and our earlier calculations, AP, y is 


expressed as 
n 


AP,y = fWwdE,y VneN. 
The limit n — oo of this integral exists because of the relation AP, y = P,, A*y. The 
second characterization of the domain D thus states y € D and therefore AP, y > Ay 
as n — oo. We conclude that A*y = Ay and A is self-adjoint. 


27.3. The Spectral Theorem 


Theorem 27.5 (Spectral Theorem) Every self-adjoint operator A on the complex 
Hilbert space H has a unique spectral representation, i.e., there is a unique spectral 
family E, = EA, t € R, on H such that 


D(A) = {x EH: / rd ||E,x||* < oo} Ax = / tdE,x Wx € D(A). 
R R 
(27.33) 


Proof At first we give the proof for the special case A > 0 in detail. At the end the 
general case is addressed by using an additional limiting procedure. 

For the self-adjoint operator A > 0 introduce the subspaces of controlled growth 
F(A,t), t >, as in Eq. (27.10) and then define for t € R, 


F(A,t) t>0, 
H, = (27.34) 
{0} t <0. 


According to Lemma 27.3, this is a family of closed linear subspaces of H. where each 
subspace is invariant under the operator A. We claim that this family of subspaces 
satisfies conditions a)—c) of Proposition 27.1. Condition a) of monotonicity is evident 
from the definition of the spaces H,. Condition b) of right continuity H, = M,<; A; 
is obtained in the following way: By monotonicity we know F(A,s) CO, <; F(A, t) 
for s > 0. Conversely suppose that x € Ms<;F(A,t) C D(A) is given; then 


27.3 The Spectral Theorem 411 


||A”x|| < ¢” ||x|| for all t > s and all n € N and thus ||A”x|| < s” ||x|| for alln € N, 
ie., x € F(A,s). Fors < 0 this is trivial. 

Finally, we prove the normalization condition c). N;er AH; = {0} trivially holds 
because of the definition (27.34). The second part of the normalization condition 


Urer MH; = H 


follows from the geometric characterization of self-adjointness, Theorem 27.2. 
Now we can use Proposition 27.1 to define a spectral family E;, t € R, such that 


ran E, = H, VreR. (27.35) 


In particular the choice t = n € N gives a sequence of projectors E,, with strong limit 
I and with range H, = F(A,n) C D®™(A) which is invariant under the operator A. 
It follows that E, A C AE,, hence AE, = E, AE, and therefore the domain of A 
is characterized by Lemma 27.1, i.e., x € D(A) @ (AE,X)nen converges in H <> 
(|| A£Z,x || nen converges in R and then 
Ax = lim AE, x Vx € D(A). 
n—->>oo 

Denote the restriction of the operator E,, AE, to the invariant subspace H,, by A,. An 
is a self-adjoint positive operator for which we will show the spectral representation 
with respect to the spectral family EY = E,E, on H,. Given a partition Z = 
Z(tj, t’) of the interval [0,n] and x € H,, introduce the points x; = E(tj-1,tj|x € 
F(A; t))OF(An, be j =1,...,m. Since different subintervals of the partitions 
are disjoint, x is the orthogonal sum of the points x;. Note also that the operator A, 
leaves the subspaces F'(n, j) = F(Ay,t))O F(An, tj-1 )+ invariant and that different 
of these subspaces are orthogonal to each other. This implies 


(x, Anx) = S\(xj,Anxj) and Anxl? = So Anxy |”. 
jal j=l 
Theorem 27.1 allows us to estimate (x;, A,x;) and | AnXxj | P as follows: 


tj-1 xi? < (xj, Anxj) < t; |x;| * 


tit [xl] < Avail <5 [el 


These estimates hold for 7 = 1,... ,m and therefore 
(x, Anx) — So tlej.xj)l = 10 aj, Andy) = 1)(x),))]| 
j=l j=l 


m 


m m 
< ox), Anes) — 410xj.x/)1 SD — HD fey SIZES [xsl < 1Z1 eP 


j=l j=l j=l 
and 
m m 
Aux? [xj [?]=1 90 E Anes |? 2 fos PI 


= j=l 


412 27 The Spectral Theorem 


m m m 

<1 Anes’ OG ea? < 2mtZ1 YO Yag||? < 2ntZ eel? 
jel -! Fl 

Since xj = E(tj-1,tj;]x we have IIx II? = px(tj) — ex(tj-1) and thus 


vii tj |x; I’ = X(t’, Z, px) is the approximating sum for the Stieltjes integral 


a t?dp,(t). The above estimate implies 


n n 
nx |? = lim 30, Z, px) = : Pdp,(t) = / Pal) E,x| 
|Z|>0 6 0 


and similarly 
(Ana) = im £0.Z.0)= fo rdpyiy = f° rate, Bx. 
|Z|>0 0 0 


The polarization identity (see Proposition 15.2) implies (y, Anx) = le td(y, E;x) 
for all y € H and therefore 


n 
A,x = i tdE,x VxeA,. 
0 


Recall A,x = E,AE,x = AE,.x for all x € H, and thus the above calculations 
show that 


n 


JAE nx? = f td||E,x|7 AE,x = [ tdE,x (27.36) 
0 0 


for all n € N and all x € H. For the sequence of projectors E,, the hypotheses 
of Lemma 27.1 have been verified. Hence, x € D(A) if, and only if, AE,x has 
a limit and if x € D(A) then the limit for n — oo is Ax. Therefore, the vector 
valued integral ie t dE,x has a limit for n — oo, and we conclude by Eq. (27.36) 
that Eq. (27.33) holds for the spectral family defined by the family of subspaces of 
controlled growth. 

Finally, we show that there is only one spectral family which represents the self- 
adjoint operator A according to Eq. (27.33) by showing: If E’, t € R, is a spectral 
family on H which represents the operator A according to this equation, then 


ranE} = F(A,t) Vt>0. 


Suppose x € ran E/ for some t > 0. Then x = E/x and thus E’x = E(,, for all 
s > 0. Now calculate for any n € N, 


[o.e} oO t 
|A”x||? = / sd Ex)? = / sd EE. xI? = / sd E,x||? 
0 0 0 


[ee] 
Z pf dl Et x |? = "x10. 
0 


27.3 The Spectral Theorem 413 


It follows that x € F(A,t) and thus ran LE’ C F(A,t). Since F(A,0) = N(A) it 
suffices to consider the case t > 0. Thus suppose t > Oandx € F(A,t)Nran (J — Ef); 
then x = (J — E’)x = limy_,o0 E(t, N]x. As earlier we find 


oo N 
A” E(t, N|x -|/ s"dE E(t, N\x = s"dE x 
0 


t 


and therefore 
CO CO 
|A”x ||? -|/ sd E,x|)? > a AI Eix |? = 07" |x|’, 
t t 


where in the last step x = (J — E’)x was taken into account. x € F(A, tf) implies 
A" x||? < 17" ||x||* for all n € N. We conclude ||A"x||? = 17" ||«||? for all n € N. 
In terms of the spectral family this reads i (s* — 17") d|| E/x ||? = 0, hence x = 
(I — E})x = 0 and ran E’ = F(A, t) follows. This concludes the proof for the case 
A>0. 

Comments on the proof for the general case: a) If A is a lower bounded self- 
adjoint operator, i.e., for some c € R one has A > —c/, then A, = A+ cl is 
a positive self-adjoint operator for which the above proof applies and produces the 
spectral representation A, = fetddy In the Exercises, we deduce the spectral 
representation for the operator A itself. 

b) The proof of the spectral representation of a self-adjoint operator A which is 
not lower bounded needs an additional limit process which we indicate briefly. 

As in the case of lower bounded self-adjoint operators the subspaces of controlled 
growth F(A, ft) are well defined and have the properties as stated in Lemma 27.3. In 
particular for t = n € N we have closed subspaces of H. which are contained in the 
domain of A and which are invariant under the operator A. Hence, the orthogonal 
projectors P,, onto these subspaces satisfy ran P, C D(A) and AP, = P,APy. 
Furthermore by the geometric characterization of self-adjointness (Theorem 27.2) the 
union of the ranges of these projectors is dense in H. and therefore lim; Pyx = x 
for all x € H. Under the inner product of the Hilbert space H the closed subspaces 
F(A,n) are Hilbert spaces too and A, = AjF,4,n) is a self-adjoint operator which is 
bounded from below: A, + I > 0. Hence for the operator A,, our earlier results 
apply. 

In the Hilbert space F(A, 7) define a spectral family E,,(t), t € R, by 


{0} t <—n, 
F(A, tnl,t+n) —-n <t. 


ran E,,(t) = 


Then the spectral representation for the operator A, in the space F(A, 7) reads 


An = / tdE,,(t). (27.37) 
R 


414 27 The Spectral Theorem 


This holds for each n € N. A suitable limit of the spectral families E,,(-),n € N 
will produce the spectral representation for the operator A. To this end one observes 


Ex(t)Pr = E,(t)Pn = Py, En(t) = E,(t), ran Ex([ _ n,n)) = F (Ax, 1) = F(A,n) 


for all t € R, and all k > n and then proves that the sequence of spectral families E,, 
has a strong limit E(- ) which is a spectral family in the Hilbert space H: 


E(t)x = lim E,(t)x Vx EH, 
noo 


uniformly in t € R. And this spectral family satisfies E([ —n,n]) = Ph. 
The spectral representation (27.37) implies for all x, y € H and alln € N and all 
k>n, 


(x, APny) = (Pax, An Pry) = / td(P,x, E,(t) Pry) 
R 


= / 1d (P,x, Ex(t)P,y) = / t d(x, E()y) 
R [—a,n] 


and similarly, for all x € H and alln e¢ N 
| AP, x |? = / t?d(x, E(t)x). 
[—an,n] 


Finally, another application of Lemma 27.1 proves the spectral representation (27.33) 
for the general case. 

c) The proof that the spectral family is uniquely determined by the self-adjoint 
operator A uses also in the general case the same basic idea as in the case of positive 
self-adjoint operators. In this case, one proves (see Exercises): If a spectral family 
E’ represents the self-adjoint operator A, then 

s+t_ t—s 


E'(s,t] = F(A — —_I, 
ran E’(s,t] ( 5 5 


for all s < tf. Since projectors are determined by their range, uniqueness of the 
spectral family follows. 


27.4 Some Applications 


For a closed symmetric operator A in the complex Hilbert space H. we can form the 
subspaces F(A,r), r > 0, of controlled growth. These subspaces are all contained 
in the domain D(A) and are invariant under the operator A. The closure of the union 
M = UnenF(A,n) of these subspaces is a subspace Ho of the Hilbert space and 


27.4 Some Applications 415 


according to the geometric characterization of self-adjointness, the operator A is 
self-adjoint if, and only if, Ho = H. 

Now suppose that A is not self-adjoint. Then 1 is a proper subspace of H.. Since 
the space M is invariant under A and dense in Ho one would naturally expect that 
the restriction Ao of the operator A to the subspace Ho is a self-adjoint operator in 
the Hilbert space Ho. This is indeed the case and the self-adjoint operator Ao defined 
in this way is called the maximal self-adjoint part of the closed symmetric operator 
A. With the help of the geometric characterization of self-adjointness the proof is 
straightforward but some terminology has to be introduced. 


Definition 27.2 Let A be a linear operator in the complex Hilbert space H and Ho 
a closed linear subspace. H is an invariant subspace of the operator A if, and 
only if, the operator A maps D(A) N Ho into Ho. 

A closed linear subspace Ho is a reducing subspace of the operator A if, and 
only if, Ho is an invariant subspace of A and the orthogonal projector Pp onto the 
subspace Ho maps the domain D(A) into itself, Pp D(A) C D(A). 

Thus a closed linear subspace Ho reduces the linear operator A if, and only if, a) 
Pox € D(A) and b) A Pox € Ho for all x € D(A). Both conditions can be expressed 
through the condition that A Pp is an extension of the operator Po A, 1.e., 


PoA C APp. (27.38) 


Clearly, if Ho is a reducing subspace of the operator A, the restriction Ao to the 
subspace Ho is a well-defined linear operator in the Hilbert space Ho. 


Definition 27.3 Let A be a linear operator in the Hilbert space Ho. The restriction 
Ao of A to a reducing subspace Ho is called the maximal self-adjoint part of A if, 
and only if, Ao is self-adjoint in the Hilbert space Ho, and if 1, is any other reducing 
subspace of A on which the restriction A, of A is self-adjoint, then 71; C Ho and 
A, C Ao. 


Theorem 27.6 (Maximal Self-Adjoint Part) Every closed symmetric operator A in 
a complex Hilbert space H. has a maximal self-adjoint part Ay. Ao is defined as the 
restriction of A to the closure Ho of the union M = U,enF(A,n) of the subspaces 
of controlled growth. 


Proof Denote by P,, the orthogonal projector onto the closed invariant subspace 
F(A,n), n € N. The sequence of projectors is monotone increasing and thus has a 
strong limit Qo, Qox = limy+oo P,x for all x € H. The range of Qo is the closure 
of the union of the ranges of the projectors P,,, i.e., ran Qo = Ho. 

In order to show that Ho is a reducing subspace of A, recall that P,, Ax = AP, x 
for all x € D(A), a property which has been used before on several occasions. For 
n — o the left-hand side of this identity converges to Q9 Ax, hence the right-hand 
side A P,,x converges too forn — oo. We know limy_,.. P,x = Qox and A is closed, 
hence Qox € D(A) and AQox = QoAx for all x € D(A). Thus, Ho is a reducing 
subspace. 

Consider the restriction Ag = Ajpcaynuy Of A to the reducing subspace Hp. It 
follows easily that Ao is closed and symmetric and that the subspaces of controlled 


416 27 The Spectral Theorem 


growth coincide: F(Ag,n) = F(A,n) for all n € N (see Exercises). Hence, the 
geometric characterization of self-adjointness proves Ao to be self-adjoint. 

Now let be another reducing subspace of A on which the restriction A; = 
Ajpcaynz, is self-adjoint. Theorem 27.2 implies 


Hy, = Unen F(A, n). 


Since A, is a restriction of A we know F(A,,n) C F(A,n) for alln € N and thus 
HH, C Ho and therefore A; C Ap. We conclude that Ag is the maximal self-adjoint 
part of A. 
Another powerful application of the geometric characterization of self-adjoint-ness 
are convenient sufficient conditions for a symmetric operator to be essentially self- 
adjoint. The idea is to use lower bounds for the subspaces of controlled growth. And 
here considerable flexibility is available. This is very important since in practice it is 
nearly impossible to determine the subspaces of controlled growth explicitly. 


Theorem 27.7 Let A Cc A* be a symmetric operator in the Hilbert space H. If 
for every n € N there is a subset D(A,n) C F(A,n) of the subspaces of controlled 
growth such that their union Do(A) = Unen D(A, n) is a total subset of the Hilbert 
space H, then A is essentially self-adjoint. 


Proof The closure A of A is a closed symmetric operator. It is self-adjoint if, 
and only if, H(A) = UnenF(A,n) is a dense subspace of H. Obviously one has 
F(A,n) Cc F(A,n) and thus Do(A) C H(A). By assumption the set Do(A) is total 
in H, hence Ho(A) is dense and thus A is self-adjoint. 
We conclude this section by pointing out an interesting connection of the geomet- 
ric characterization of self-adjointness with a classical result of Nelson which is 
discussed in detail in the book [2]. 

Let A be a symmetric operator in the complex Hilbert space H. x € D®(A) is 
called an analytic vector of A if, and only if, there is a constant C,, < 00 such that 


|A’x||< Clin! Vn EN. 


Denote by D(A) the set of all analytic vectors of A. Then Nelson’s theorem states 
that A is essentially self-adjoint if, and only if, the space of all analytic vectors D®(A) 
is dense in H. Furthermore, a closed symmetric operator A is self-adjoint if, and 
only if, D®(A) is dense in H. In the Exercises, we show 


H(A) = UnenF(A,n) € D°(A). 


Thus, Nelson’s results are easily understood in terms of the geometric characteriza- 
tion of self-adjointness. 


27.5 Exercises 


1. Fora spectral family E,, t € R, on a Hilbert space H prove 
E(Q)E(2) = EW h) 


References 417 


for any intervals J; = (s;,t;]. Here we use E(@) = 0. 

2. Prove parts b)—d) of Theorem 27.3. 

3. Suppose a self-adjoint operator A in a complex Hilbert space H. has the spectral 
representation A = Te tdE, with spectral family E;, t ¢ R. Letc € Rbea 
constant. Then find the spectral representation for the operator A + cI. 

4. Let A be a self-adjoint operator in the complex Hilbert space which is 
represented by the spectral family E/, t € R. Prove: 

; s+tt t—s 
ran E'(s,t] = F(A 5 I, 5 ) 
for all s < ¢ and conclude uniqueness of the spectral family. 
Hints: Recall that ran E’(s,t] = ran E/ M (ran E’)* and prove first 


s+t., : s+t.,, 
wa-Ses f oS payena? 


for all x € ran E’(s, t]. Then one can proceed as in the case of a positive operator. 

5. Let A be a closed symmetric operator in the Hilbert space H. and let Ho be the 
closure of the union of the subspaces of controlled growth F(A,n), n € N. Ho is 
known to be a reducing subspace of A. Prove: a) The restriction Ag of A to Ho is 
a closed and symmetric operator in Hq; b) F(A,n) = F(Ao,n) for alln € N. 

6. Denote by M, the operator of multiplication with the real valued piece-wise con- 
tinuous function g in the Hilbert space H = L7(R). Assume that for every n ¢ N 
there are nonnegative numbers r,,, R, such that [—r,, R,] C {x € R: |g(x)| <n}. 
Prove: M, is self-adjoint on D(M,) = {f €L’(R):g-fe L’(R)} ifr, > co 
and R, > ocoasn > oO. 

Hints: In Theorem 27.7, try the sets 


D(M,,n) = {f € L’(R): supp f Sf —rn, Rn]. 


7. Consider the free Hamilton operator Hp in momentum representation in the 
Hilbert space H = L?(R°), ie., (How)\(p) = = w(p) for all p € R? and all 
w € D(A). Since we had shown earlier that Ho is self-adjoint we know that it 
has a spectral representation. Determine this spectral representation explicitly. 


References 


1. Leinfelder H. A geometric proof of the spectral theorem for unbounded self-adjoint operators. 
Math Ann. 1979;242:85-96. 

2. Reed M, Simon B. Fourier analysis. Self-adjointness. vol. 2 of Methods of modern mathematical 
physics. New York: Academic Press; 1975. 


Chapter 28 
Some Applications of the Spectral 
Representation 


For a self-adjoint operator A ina complex Hilbert space H, the spectral representation 


R 


has many interesting consequences. We discuss some of these in this chapter. For a 
more comprehensive discussion of the meaning of the spectral theorem we refer to 
[1]. 

In Theorem 27.4 we learned to integrate functions with respect to a spectral family 
E,, t € R. This applies in particular to the spectral family of a self-adjoint operator 
and thus allows us to define quite general functions f(A) of a self-adjoint operator 
A. Some basic facts of this functional calculus are presented in the first section. 

The next section introduces a detailed characterization of the different parts of the 
spectrum of a self-adjoint operator in terms of its spectral family. The different parts 
of the spectrum are distinguished by the properties of the measure dp, (t) = d(x, E;x) 
in relation to the Lebesgue measure and this leads to the different spectral subspaces 
of the operator. 

Finally, we discuss the physical interpretation of the different parts of the spectrum 
for a self-adjoint Hamilton operator. 


28.1 Functional Calculus 


We restrict ourselves to the functional calculus for continuous functions though it 
can be extended to a much wider class, the Borel functions, through an additional 
limit process. 


Theorem 28.1 Let A be a self-adjoint operator in the complex Hilbert space H 
and E,, t € R, its spectral family. Denote by Cp(R) the space of bounded continuous 
functions g : R > C. Then for every g € C;(R), 


g(A)= [ scoae. (28.1) 


© Springer International Publishing Switzerland 2015 419 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_28 


420 28 Some Applications of the Spectral Representation 


is a well-defined bounded linear operator on H and g +> g(A) is a continuous 
algebraic *-homomorphism C,(R) > B(H), i.e., 


a) (4,81 + 42g2)(A) = a1 91(A) + a299(A) for all g; € Cp(R) and all aj € C 
b) \(A)=I 

c) g(A) f(A) = (g- f(A) for all f, g € C,(R) 

d) g(A)* = (A) for all g € C,(R) 

€) IIg(ADIl < WS lloo forall g € Cy(R) 


In addition the following holds: 


1) id((A)=A 

2) g €C,(R) and g > O implies g(A) > 0 

3) If g € C,(R) is such that : € C;(R), then g(A)"! = 7A) 

4) a(g(A)) = g(o(A)) = {g(A) : A € o(A)} (spectral mapping theorem) 


Proof Theorem 27.3 and Lemma 27.7 easily imply that for every g € C,(R) the 
operator g(A) is a well-defined bounded linear operator on , since for all x « H 
one has fz |g(u)|dEyxll? < gil fe ld Euxll? = IIgll2 llo|?. Part (e) follows 
immediately. Parts (a), (b) and (d) also follow easily from a combination of Theorem 
27.3 and Lemma 27.7. The proof of Part (c) is left as an exercise where some hints 
are given. 

The first of the additional statements is just the spectral theorem. For the second 
we observe that for all x € H one has (x, g(A)x) = fe g(u)d || Ex ||? > O and hence, 
g(A) > 0. The third follows from the combination of (b) and (c). The proof of the 
spectral mapping theorem is left as an exercise for the reader. 


Corollary 28.1 Let A be a self-adjoint operator in the Hilbert space H and E,, 
t € R its spectral family. Define 


Vitjp=eit4= i e™dE, YWteR. (28.2) 
R 


Then V(t) is a strongly continuous one-parameter group of unitary operators on H. 
with generator A. 


Proof V(t) is defined as e,(A) where e, is the continuous bounded functions e;(u) = 
e "™ for all u € IR. These exponential functions e, satisfy @ = e_; = 1. Hence, parts 
(d) and (3) imply (e,;(A))*e,(A) = e;(A)(e,(A))* = J. Furthermore, these functions 
satisfy e, -e, = e,1, for all t,s € Rand e)9 = 1. Hence parts (b) and (c) imply 
V(t)V(s) = Vt+s) and V(O) = J, thus V(f) is a one-parameter group of unitary 
operators on H. 

For x € H and s,t € R we have ||V(t + s)x — V(f)x|| = ||V()(V(s)x — x)|| = 
|V(s)x — x|| and 


|Vis)x — x|? = / Ce ™ — DPd Ex. 
R 


28.2 Decomposition of the Spectrum—Spectral Subspaces 421 


Since |(e *“ — 1)| < 2 and |(e *“ — 1)| > 0 ass — O for every u, a simple 
application of Lebesgue’s dominated convergence theorem implies || V(s)x — x|| > 
0 for s — 0. Therefore, the group V(f) is strongly continuous. 

According to Stone’s Theorem 23.2, this group has a self-adjoint generator B 
defined on D = {x € H: J lim,.9 +(V()x — x)} by iBx = lim,,9 }(V(t)x — x). 

According to the spectral Theorem 27.5, a vector x € H. belongs to the domain of 
A if, and only if, te ud ||E,x||> < oo. Thus, by another application of Lebesgue’s 
dominated convergence theorem, we find that 


fie ———— “Pd Eu? 


has a limit for t > O since |“—— al |> < u*. We conclude that D(A) C Dand A C B. 
Since A is self-adjoint this implies A=B. 
The following corollary completes the proof of Stone’s theorem. 


Livi 
| (t)x — x] 


Corollary 28.2. Let U(t) be a strongly continuous one-parameter group of unitary 
operators on the complex Hilbert space H and A its self-adjoint generator. Then 


U(th=e(Ay=e"4 WeR. 


Proof We know already that the strongly continuous one-parameter group of unitary 
operators V(t) = e;(A) has the generator A and that both U(t) and V(t) leave the 
domain D of the generator A invariant. For x € D, introduce x(t) = U(t)x — 
V(t)x € D for allt € R. Thus, this function has the derivative ox(t) = 1AU(t)x — 
iAV(t)x = iAx(t), and therefore, 


. lx? = (Sx, x09) + (x(t), “x(0) = (1Ax(t), x()) + (x(t), 1Ax(t) = 0 


for all t € R. Since, x(0) = 0 we conclude that x(t) = 0 for all t € R and therefore 
the groups U(t) and V(t) agree on D. Since D is dense this proves that U(t) and 
V(t) agree on H. 


28.2 Decomposition of the Spectrum—Spectral Subspaces 


According to Weyl’s criterion (Theorem 28.5), a real number A belongs to the spec- 
trum of a self-adjoint operator A if and only if, there is a sequence of unit vectors x, 
such that ||(A — AJ)x, || — 0asn — oo. The spectral theorem allows us to translate 
this criterion into a characterization of the points of the spectrum of A into proper- 
ties of its spectral family E. This will be our starting point for this section. Then a 
number of consequences are investigated. When we relate the spectral measure dp, 
associated to the spectral family of A, and a vector x € H to the Lebesgue measure 
dd on the real numbers we will obtain a finer decomposition of the spectrum (A). 


422 28 Some Applications of the Spectral Representation 


Theorem 28.2 Let A be a self-adjoint operator in a complex Hilbert space Hand 
E,, t € R, its spectral family. Then the following holds: 


a) fe a(A) > Eqye — Eye #0 Ve > 0. 
b) w € Ris an eigenvalue of A E({p}) = Ey — Ey-o £9. 


Proof Suppose that there is ane > O such that P = E,,,, — E,-, = 0. Then for 
any x € D(A) with ||x|| = 1 we find by the spectral theorem that 


(A wisi? =f 


|t—p| ze 


Ie — wld yexIP=e | d || E,x||? =e" |x|? =27>0, 


|t—p|ze 


since we can write x = Px + (J — P)x = UI — P)x. Thus, no sequence of unit 
vectors in D(A) can satisfy Weyl’s criterion, hence jz ¢ o(A). 
Conversely, if P, = E Ni Go E ed # 0 for all n € N, then there is a sequence 


Xn = PyXn in D(A) with In Il = 1. For this sequence we have by the spectral 
theorem 
2 2 2 1 2 1 
(A = pal? = |t — wd || Epxnll? << lial? = 
tus} uw a 


and thus this sequence satisfies Weyl’s criterion and therefore jz belongs to the 
spectrum of A. This proves Part (a). 

Next, suppose that jz € R is an eigenvalue of A. Let x € D(A) be a normalized 
eigenvector. Again by the spectral representation, the identity 


0=|(A - eDx|? = / It — 7d || E,x ||? 
R 


holds. In particular, for all N € N and alle > 0, 


N N 
o= | I wd zx? > of dE,x|° =e? |EQu +e, N]xI. 
U+e pute 


We conclude thatO = Eyx — E,4-x and similarly 0 = E_yx — E,_,x for all 
N € N and all e > 0. Now apply the normalization condition of a spectral family 
to conclude x = E,,,-x and 0 = E,,.x for all ¢ > 0. This implies that, using right 
continuity of a spectral family, x = (E,, — E,—o)x and the projector E,, — E,,-0 is 
not zero. 

When we know that the projector P = E,, — E,,-o 1s not zero, then there is a 
y € H such that y = Py and ||y|| = 1. It follows that y ¢ D(A) and E,y = y for 
t > wand E,y = 0 fort < yp, hence ||(A — wDy|l’ = fe It — uP dI|E,yll? = 0, 
ie. (A — wl)y = 0 and pis an eigenvalue of A. 

The set D, = {x € D(A): x #0, Ax = Ax for some A € R} of all eigenvectors 
of the self-adjoint operator A generates the closed subspace [D-] = Hp = H(A) 


28.2 Decomposition of the Spectrum—Spectral Subspaces 423 


called the discontinuous subspace of A. Its orthogonal complement H, is the 
continuous subspace H,(A) of A, and thus one has the decomposition 


H = H(A) ® H-(A) 


of the Hilbert space. 

With every spectral family E,, ¢ € IR, one associates a family of spectral measures 
(dpx)xe7 on the real line R, which are defined by sa dpx(t) = (x, E(a,b]x). In 
terms of these spectral measures the continuous and discontinuous subspaces are 
characterized by 


Proposition 28.1 Let A be a self-adjoint operator in the complex Hilbert space H 
with spectral family E;, t € R. For x € H, denoted by dp, the spectral measure is 
defined by the spectral family of A. Then 


a) x €H,(A) if, and only if, there is a countable set a C R such that E(a)x = x or 
equivalently px(a°) = 0. 

b) x € H,(A) if, and only if, t 1 ||E;x||? is continuous on R or equivalently 
px({t}) = 0 for every t ER. 


Proof If a C Risa Borel set, then E(a)x = x if, and only if, E(a°)x = 0 
if and only if, p,(a°) = || E(a°)x||> = 0. Therefore, the two characterizations 
of (A) are equivalent. Since H(A) is defined as the closure of the set of all 
eigenvectors of A, every point x € H(A) is of the form x = limy_,¢ YVi=1 Cjej 
with coefficients c; € C and eigenvectors e; of A corresponding to eigenvalues 
A;. The list of all different eigenvalues is a countable set a = {A i@:itEN } and 
the corresponding projectors E ({a ap) are orthogonal and satisfy E ({a ipe 7 = ej 
according to Theorem 28.2. For every k € N, we thus find E(a)e, = e, and therefore 
E(a)x = lini, EG) >) 4 Chee = 

Conversely, if x € H satisfies E(a)x = x for some countable set a = 
{Aj fj € N}, then x = lim, 4 E({a;})x and E({a;}) is not zero if, and 
only if, 4; is an eigenvalue (Theorem 28.2). This proves Part (a). 

For every x € H)(A)~ and every A € R we find p,({A}) = (x, E({A})x) = 0, 
since by the first part E({A})x € H(A). 

If for x € H we know p,({A}) = 0 for every 4 € R, then || E(a)x ||" = 0,(a) = 0 
for every countable set a C R. For every y € H,(A) there is a countable set a C R 
such that E(a)y = y, hence (x,y) = (x,E(a)y) = (E(a)x,y) = O and thus 
xeEH pA). The definition of the spectral measure dp, implies easily that the two 
characterizations of H_.(A) are equivalent. 

A further decomposition of the continuous subspace of a self-adjoint operator A 
is necessary for an even finer analysis. 


Definition 28.1 For a self-adjoint operator A in a complex Hilbert space H with 
spectral family E,, t € R, the following spectral subspaces are distinguished: 


a) Singularly continuous subspace 7/,.(A) of A: x € H;-(A) if, and only if, there 
exists a Borel seta C R of Lebesgue measure zero (|a| = 0) such that E(a)x = x 


424 28 Some Applications of the Spectral Representation 


b) Absolutely continuous subspace ,.(A) of A: Hac(A) = H-(A) © Hg-(A) 
c) Singular subspace H,(A) = H(A) ® Hy-(A) 


In the Exercises we show that H;-(A) is indeed a closed linear subspace of H. 
Evidently these definitions imply the following decomposition of the Hilbert space 
into spectral subspaces of the self-adjoint operator A. 


H = H,(A) ® H-(A) = Hp(A) ® Hsc(A) ® Hac(A) = Hs(A) ® Hac(A). (28.3) 


Again these spectral subspaces have a characterization in terms of the associated 
spectral measures. 


Proposition 28.2 For a self-adjoint operator A in the complex Hilbert space H the 
singular and the absolutely continuous subspaces are characterized by 


H;(A) = {x €H.: A Borel seta C R such that |a| = 0 and p,(a‘) = 0} 
= {x €H: px is singular with respect to the Lebesgue measure}, 
Hy(A) = {x €H: for every Borel set a C R with |a| = Oone has p,(a) = 0} 


= {x €H: px is absolutely continuous w. resp. to the L-measure}. 


Proof Every x € H,(A) is the sum of a unique y € H(A) and a unique z € Hs-(A). 
According to Proposition 28.1, there is a countable set a C R such that E(a)y = y 
and by defintion of the singularly continuous subspace there is a Borel set b C R 
with |b| = 0 and E(b)z = z.m = a U bis again a Borel set with Lebesgue measure 
zero and we have E(m)x = E(m)E(a)y+ E(m)E(b)z = E(a)y + E(b)z = x. Then 
clearly p,(m°) = 0. 

Conversely, if x € H satisfies o,.(m°) = O for some Borel set m of measure 
zero, then E(m)x = x. Recall that t + || E;,||? is a monotone increasing function of 
bounded total variation. Thus, it has a jump at, at most, countably many points f;. 
Introduce the seta = {t , FEN \. The last proposition implies that E(a)x € H(A). 
In the Exercises we show 


E((tVE(a)x=0 WeER. 


We deduce E(b)E(a‘°)x = 0 for every countable set b C R. If y € H(A) is given, 
there is a countable set b C R such that E(b)y = y; we calculate (y, E(a‘)x) = 
(E(b)y, E(a°)x) = (y, E(b)E(a‘)x) = Oand see E(a°)x € H,(A)> = H-(A). Fur- 
thermore, the identity E(m)x = x implies E(m)E(a°)x = E(m)x — E(m)E(a)x = 
E(a‘°)x. Therefore, the vector E(a‘°)x belongs to the singularly continuous subspace. 
The identity x = E(a)x + E(a‘)x € H,(A) ® H,-(A) finally proves the first part. 

To prove the second part take any x € H and suppose that for every Borel set 
a C R with |a| = 0 we know p,(a) = 0 and therefore E(a)x = 0. For every 
y € H,(A), there is a countable set a C R such that E(a)y = y and for every 
z € H;s-(A), there is a Borel set b C R such that |b] = 0 and E(b)z = z. This implies 
(x,y+z) = (x, E(a)y) + (x, E(b)z) = (E(a)x, y) + (E(b)x, z) = 04+0 = 0, hence 
x €Hs(A)* = Hac(A). 


28.2 Decomposition of the Spectrum—Spectral Subspaces 425 


For x € H and any Borel set b C R with |b] = 0, one knows E(b)x € H,(A) 
according to the first part. If now x € H,-(A) is given and b C R any Borel set 
with |b] = 0, we find p,(b) = ||E(b)x||? = (x, E(b)x) = 0, which proves the 
characterization of H,,(A). 

There is another way to introduce these spectral subspaces of a self-adjoint op- 
erator A in a Hilbert space 1. As we know, for every x € H the spectral measure 
dp, is a Borel measure on the real line R. Lebesgue’s decomposition theorem (see 
for instance [2]) for such measures states that do, has a unique decomposition into 
pairwise singular measures 


dpx = dpx.pp + dpx.se =F dpxac (28.4) 


with the following specification of the three measures: dp, pp is a pure point measure, 
i.e., there are at most countably many points t; such that Px,pp (At }) F 0. dpy sc iS a 
continuous measure, i.¢., Pxac({t}) = 0 for all t € R, which is singular with respect 
to the Lebesgue measure, i.e., there is a Borel set a C R such that p,.-(a) = 0 
while |a°| = 0. Finally, doy a@¢ is a Borel measure which is absolutely continuous 
with respect to the Lebesgue measure, i.e., for every Borel set b C R with |b] = 0, 
one has py ac(b) = 0. 

As a consequence, we have the following decomposition of the corresponding 
L?-space: 


L?(R, dp,) = L?(R, dpy, pp) ® L7(R, dpysc) ® L?(R, dfx ac)- (28.5) 


In the terminology of Lebesgue’s decomposition theorem we can reformulate the 
definition of the various spectral subspaces: 


H(A) = {x € H : dp, is a pure point measure on R}; 
H,-(A) = {x € H: dp, is continuous and singular w. resp. to the L-measure}; 


Hac(A) = {x € H: dp, is absolutely continuous w. resp. to the L-measure }. 


Therefore, because of the spectral theorem and our previous characterization of the 
spectral subspaces, the decompositions (28.3) and (28.5) correspond to each other 
and thus in the sense of Lebesgue measure theory this decomposition is natural. 

We proceed by showing that the given self-adjoint operator A has a restriction 
Aj = Ajp,, Di = D(A) Hi, to its spectral subspace H; = H;(A) where 7 stands 
for p,c,sc,ac,s. This is done by proving that these spectral subspaces are reducing 
for the operator A. 


Theorem 28.3 Let A be a self-adjoint operator in the complex Hilbert space H. 
Then the restriction A; of A to the spectral subspace H; is a self-adjoint operator in 
the Hilbert space Hj, i = p,c, Sc, ac,S. 


426 28 Some Applications of the Spectral Representation 


Proof Denote by P; the orthogonal projector from H onto the spectral subspace /;. 
Recall that 1; is a reducing subspace for the operator A if 


P,D(A) C D(A) and. AP;x = P;Ax Vx € D(A). 


We verify this condition explicitly for the case i = p, i.e., for the restriction to the 
discontinuous subspace. 

According to Proposition 28.1, a point x € H belongs to the discontinuous sub- 
space _,(A) if, and only if, there is a countable set a C R such that E(a)x = x. The 
projector E(a) commutes with all the projectors E,, t € R, of the spectral family E of 
A. Thus, x € H(A) implies E;x € H(A) for allt € Rand therefore FE, P, = PE; 
for allt € R. 

The spectral theorem says: x € D(A) if, and only if, te td||E;x||?_ < 00. For 
x € D(A), we thus find 


[ Pater si? = f Pay, ex? < [ Paye.xiP <0. 
R R R 


This proves P,x € D(A). Now we apply again the spectral theorem to calculate for 
x € D(A) 


AP,x = / tdE, Ppx = / tdP,E,x = i t P,dE,x = P, | tdE,x = P,Ax. 
R R R R 


It follows that H(A) is a reducing subspace for the self-adjoint operator A. We 
conclude that the restriction of A to this reducing subspace is self-adjoint. 
In the Exercises the reader is asked to fill in some details and to prove the remaining 
cases. 
The last result enables the definition of those parts of the spectrum of a self-adjoint 
operator A which correspond to the various spectral subspaces. 


o-(A) = o(A,) = continuous 
0O;-(A) = o(As.) = singularly continuous 

spectrum of A. 
Oac(A) = o(Agc) = absolutely continuous 


0;(A) = o(As) = singular 


The point spectrum o,(A), however, is defined as the set of all eigenvalues of A. 
This means that in general we only have 


o,(A) = o(A,). 


Corresponding to the definition of the various spectral subspaces (Definition 28.1), 
the spectrum of a self-adjoint operator A can be decomposed as follows: 


o(A)= 0,(A) U dsc(A) U Oac(A) = 05(A) U Gac(A) = 0/(A) Uoa-(A). (28.6) 


28.2 Decomposition of the Spectrum—Spectral Subspaces 427 


There is a third way to decompose the spectrum of a self-adjoint operator into two 
parts. Denote by o4(A) the set of those isolated points of o(A) which are eigenvalues 
of finite multiplicity. This set is the discrete spectrum og(A). The remaining set 
0-(A) = o(A)\oq(A) is called the essential spectrum of A, 


o(A) = og(A) Uo,(A). (28.7) 


As we are going to show, the essential spectrum has remarkable stability properties 
with regard to certain changes of the operator. But first the essential spectrum has to 
be characterized more explicitly. 


Theorem 28.4 For a self-adjoint operator A in a complex Hilbert space H with 
spectral family E, the following statements are equivalent. 


a) X€0,(A) 

b) There is a sequence (Xn)nen C D(A) such that 
bi) (Xn)nen converges weakly to 0 
bz) lim infy-so0 ||Xp|| > 0 
b3) limy soo (A — AD)x, = 0 

c) dim (ran (E44, — E,--0)) = oo for every r > 0 


Proof Suppose 2 € o,(A). If A is an eigenvalue of infinite multiplicity, then there 
is an infinite orthonormal system of eigenvectors x,. Such a system is known to 
converge weakly to 0 and thus (b) holds in this case. Next, suppose that 4 is an 
accumulation point of the spectrum of A. Then there is a sequence (A,,),en C O(A) 
with the following properties: 


lim A, =A, An FA, An FAm Wan,neN, nm. 

noo 
Hence, there is a sequence of numbers r, > 0, which converges to zero such that 
the intervals (A, — ry,An + 7,) are pair-wise disjoint. Points of the spectrum have 
been characterized in Theorem 28.2. Thus, we know for A,, € o(A) that Fy,4,, — 
E,,-r, 4% 9. Therefore, we can find a normalized vector x, in the range of the 
projector F,,4,, — E4,-,, for all n € N. Since the intervals (Ay — rp, An + 7m) are 
pair-wise disjoint, the projectors E,,4,, — E),—,, are pair-wise orthogonal and we 
deduce (Xn, Xm) = dnm-. The identity 


—Tn 


Ant+tn 


(A — AD xall? = i (t —A)d|| E,xqll? = i (t — A)?d |] E,xn ll? 
R Xx 


n—ln 


implies limy-+o0 (A —AI)x, = 0, since, limy +o An = A and limy-+o0 7, = 0. Again, 
since infinite orthonormal systems converge weakly to 0, statement (b) holds in this 
case too. Thus, (a) implies (b). 

Now assume (b). An indirect proof will show that then (c) holds. Suppose that there 
issomer > Osuch that the projector E,,,—,_, has a finite dimensional range. Then 
this projector is compact. Since compact operators map weakly convergent sequences 


428 28 Some Applications of the Spectral Representation 


onto strongly convergent ones, we know for any sequence (x,,),en Satisfying (b) that 
limy+ 00 (Fy4+ — Ex—+)X, = 0. Now observe the lower bound 


A+r 


AAD? = [= ah xn a a (/ al Esa? — f AEs) 
R R A-r 


=P (lxnll? — (Ext — Ex—r)xn Il?) 


which gives 
2 ae. 2 
[nll S Ente — Ear )xnll” + ZIMA — AD nll", 


and thus a contradiction between (b2), (b3) and the implication of (b;) given above. 
Finally suppose (c). We have to distinguish two cases: 


a) dim (ran(£, — E,_0)) = ~, B) dim (ran(E, — E,_9)) < co. 


In the first case we know by Theorem 28.2 that 4 is an eigenvalue of infinite 
multiplicity and therefore 1 € o,(A). 
Now consider the second case. By assumption we know that 


Fyay — Exp = (Exar — Ey) + CE, — Fy-0) + (Ex-0 — Far) 


is a projector of infinite dimensional range for every r > 0. The three projectors of 
this decomposition are orthogonal to each other since the corresponding intervals are 
disjoint. Therefore, the sum of the projectors (£34, — E,)+(E,_9— E,_,) has an infi- 
nite dimensional range and thus (Theorem 28.2) in particular [((A —r,A)UQ,A+7r)] 
1 o(A) 4 G for every r > 0. This means that A is an accumulation point of the 
spectrum of A, i.e., A © o(A). We conclude that (c) implies (a). 


Remark 28.1 From the proof of this theorem it is evident that condition (b) could 
be reformulated as 


There is an infinite orthonormal system {x, :1 € N}with the property lim, (A—AI)x, =0. 


This characterization (b) of the points of the essential spectrum is the key to the 
proof of the following theorem on the “invariance” of the essential spectrum under 
“perturbations” of the operator A. 


Theorem 28.5 (Theorem of Weyl) Suppose that A and B are two self-adjoint 
operators in the complex Hilbert space H. If there is az € (A) p(B) such 
that 

T = (Az)! -(B-ziy! 


is a compact operator, then the essential spectra of A and B agree: o-(A) = o(B). 


Proof We show first o-(A) C o-(B). Take any A € o,(A). Then there is a sequence 
(X%n)nen Which satisfies condition (b) of Theorem 28.4 for the operator A. For all 
n €N, define y, = (A — z1)x, = (A — AD)x, + (A — zx. It follows that this 


28.3 Interpretation of the Spectrum of a Self-Adjoint Hamiltonian 429 


sequence converges weakly to 0 and the estimate || y, || > |A—z|llxn|—||(A-—AD xl, 
valid for sufficiently large n € N, implies lim inf,-, 9 ||_y,|| > 0. Next, we take the 
identity 


(B= 21)! -A=2) Wyn = —Tyn — A-— 2) '(A- AD Xn 


into account. Since T is compact and the sequence (y,)nen converges weakly to 0, 
we deduce from condition (b3) that 


lim [((B —z1)"' —(A—z) ly, = 0. 

noo 
Now introduce the sequence z, = (B — zI ly, ne N. Clearly z, € D(B) for all 
n € N and this sequence converges weakly to 0. From the limit relation given above 
we see lim inf, 9 ||Z,|| > 0. This limit relation also implies 


lim (B —AD)z, = 0 
noo 
since (B — Al)z, = (B — zD)Zn + (Z — Zn = Yn + (Z — AB — z1)~! yn and since 
Yn = (A — zI)x, converges to 0 by condition (b3). 
Therefore, the sequence (Zz, ),<en satisfies condition (b) for the operator B and our 
previous theorem implies that A is a point of the essential spectrum of the operator B. 


Since, with T also the operator —T is compact, we can exchange in the above 
proof the role of the operators A and B. Then we get o,(B) C o,(A), and thus 
equality of the essential spectra. 


28.3 Interpretation of the Spectrum of a Self-Adjoint 
Hamiltonian 


For a self-adjoint operator A in a complex Hilbert space, one can form the one- 
parameter group of unitary operators U(t) = e~ 4, and one can identify several 
spectral subspaces (A) for this operator. It follows that this unitary group leaves 
the spectral subspaces invariant but it behaves quite differently on different spectral 
subspaces. This behavior we study in this section, but for the more concrete case of a 
self-adjoint Hamiltonian in the Hilbert space H = L?(IR*) where a concrete physical 
and intuitive interpretation is available. These investigations lead naturally to the 
quantum mechanical scattering theory for which there are quite a number of detailed 
expositions, for instance the books [3, 4]. Certainly, we cannot give a systematic 
presentation of scattering here, we just mention a few basic and important facts in a 
special context, thus indicating some of the major difficulties. 

In quantum mechanics the dynamics of a free particle of mass m > 0 is governed 
by the free Hamilton operator Hy = ;4 P?. Its spectrum has been determined to 


2m 
be o(Ho) = o-(Ho) = [0, co). In case of an interaction the dynamic certainly 


430 28 Some Applications of the Spectral Representation 


is changed. If V(Q) is the interaction operator the dynamic is determined by the 
Hamilton operator 
H = HAy+ V(Q). 


We have discussed several possibilities to ensure that this Hamilton operator is self- 
adjoint (see Theorem 23.9). Here we work under the following assumptions: 


V(Q) is defined and symmetric on the domain D of the free Hamilton operator. 
H = Ho + V(Q) is self-adjoint and lower bounded on D. 


These two self-adjoint operators generate two one-parameter groups of unitary 
operators in L?(R?): 


U° =e Yy=u(V)=e tt, vWteR. 


t 


Recall: If ¢9 € D(A), then ¢(t) = U;¢p is the solution of the Schrodinger equation 


ae ae 
in -o(t) = Holt) 


for the initial condition ¢(t = 0) = ¢o. This change with time of states can also be 
expressed as a time change of observables A, according to the Heisenberg equation 


it it 
A; = ort Aer ae 


Quantum scattering theory studies the long-term behavior of solutions of the 
Schrodinger equation. If A is an eigenvalue of H with eigenvector go, then by func- 
tional calculus U;¢o = e Kho and the localization properties of this eigenvector 
do not change under the dynamics. 

For potentials V 4 0 which decay to 0 for |x| —> 00, one expects that the particle 
can “escape to infinity” for certain initial states dp and that its time evolution U;(V )do 
approaches that of the free dynamics U,(V = 0) wo for a certain initial state wo, since 
“near infinity” the effect of the potential should be negligible. This expectation can 
be confirmed, in a suitable framework. 

According to classical mechanics, we expect to find two classes of states for the 
dynamics described by the Hamilton operator H: 


a) In some states the particle remains localized in a bounded region of R°, for all 
times t € R (as the eigenstate mentioned above). States describing such behavior 
are called bound states. 

b) In certain states ¢ the particle can “escapes to infinity” under the time evolution 
U,. Such states are called scattering states. 


Certainly, we have to give a rigorous meaning to these two heuristic concepts 
of a bound and of a scattering state. This is done in terms of Born’s probability 
interpretation of quantum mechanics. Given ¢ € L*(R*) with |||] = 1 define 


m(U,$, A) = [ \U,d)x)Pdx = xa (28.8) 


28.3 Interpretation of the Spectrum of a Self-Adjoint Hamiltonian 431 


m(U;@, A) is the probability of finding the particle at time f in the region A C R?. 
Xa is the characteristic function of the set A. 


Definition 28.2 ¢ € L?(R’) is called a bound state for the Hamilton operator H 
if, and only if, for every ¢ > 0 there is acompact set K C R? such that m(U,¢, K) => 
1—eforallt eR. 

vw € L7(R°) is called a scattering state for the Hamilton operator H if, and only 
if, for every compact set K C R? one has m(U;W, K) > 0 as |t| > oo. 

Bound states and scattering states have an alternative characterization which in 
most applications is more convenient to use. 


Lemma 28.1 a) ¢ € L?(R*) is a bound state for the Hamiltonian H if, and only if, 


lim sup || Fo RU,¢||, = 0 (28.9) 


R>© teR 


where F. pr is the characteristic function of the set {x ER?: |x| > Rh. 
b) @ € L?(R°) is a scattering state for the Hamiltonian H if, and only if, for every 
R € (0,00), 


lim || FzgU,¢|, =0 (28.10) 


|t| oo 


where Fr is the characteristic function of the set {x eR: |x| < Rh. 


Proof The proof is a straightforward exercise. 

Denote the set of all bound states for a given Hamiltonian H in L?(R?) by M,(H) 
and by M,(#7) the set of all scattering states for this Hamilton operator. The following 
lemma describes some basic facts about these sets. 


Lemma 28.2 The sets of all bound states, respectively of all scattering states of a 
Hamilton operator H are closed subspaces in L?(R*) which are orthogonal to each 
other: M,(H)LM,(H). Both subspaces are invariant under the group U;. 


Proof The characterization (28.9) of bound states and the basic rules of calculation 
for limits immediately imply that M, (7) is a linear subspace. The same applies to the 
M;(#). Also, invariance under the group U; is evident from the defining identities 
for these subspaces. 

Suppose that @ € L?(R?) is an element of the closure of M,(H). Then there is a 
sequence (¢))ncen C M,(A) with limit ¢. For R > 0 and t € R we estimate with 
arbitrary n € N, 


IFSRU bllo < |FSRUMG — baile + WPS RU dalle < IO — dalle + FE RUrdbnlle - 


For a given e > 0, there is ann € N such that ||¢ — ¢,||, < ¢/2, and since 
gn € M(H) there is an R, € (0,00) such that || FX RU; dy ||. < ¢/2 for all R > Ry, 
and all t € R. Therefore, || F.rU;@||2 < ¢ for all t € Rand all R > R, and thus 
condition (28.9) holds. This proves that the linear space of all bound states is closed. 
The proof that the space of all scattering states is closed is similar (See Exercises). 


432 28 Some Applications of the Spectral Representation 
Since U, is unitary, we find for 6 € M,(H) and W € M,(A), 


(o,W)2 = (Ub, Urb )2 = (Fo RU id, Uib)2 + (Urb, FerUi)2 


and thus for allt ¢ Randall0 < R< «am, 


Kd, Wal <I FeRU blo Wills + Ipllo | FerUrv |], - 


In the first term take the limit R — oo and in the second term the limit |t| — oo and 
observe Eq. (28.9), respectively Eq. (28.10) to conclude (¢, w), = 0. This proves 
orthogonality of the spaces M,(#) and M;(#). 

There is a fundamental connection between the spaces of bound states, respec- 
tively scattering states, on one side and the spectral subspaces of the Hamiltonian on 
the other side. A first step in establishing this connection is taken in the following 
proposition. 


Proposition 28.3. For a self-adjoint Hamilton operator H in L(IR*), every nor- 
malized vector of the discontinuous subspace is a bound state and every scattering 
state belongs to the continuous subspace, i.e. 


H,(H) © M,(4), M;(H) C H.(A). (28.11) 


Proof For an eigenvector of the Hamiltonian H with eigenvalue EF, the time 
dependence is U,@ = e~**"¢ and thus || FS pU,¢||3 = diet ld(x)|*dx > 0 as 
R — o, for every t € R and condition (28.9) follows, i.e., 6 € M,(A). Since, 
M,(/7) is closed this proves H,(H) © M)(#). By taking the orthogonal comple- 
ments we find M,(H)! C H,(H)+ = H-(#). Finally, recall M(H) C M,(H)*. 
And the proof is complete. 

Heuristic considerations seem to indicate that the state of a quantum mechanical 
particle should be either a bound state or a scattering state, i.e., that the total Hilbert 
spaces H = L*(R°) has the decomposition 


Unfortunately this is not true in general. Nevertheless, a successful strategy is known 
which allows us to establish this decomposition under certain assumptions on the 
Hamilton operator. 

Suppose that we can show 


A. HaclH) © M,(H), B. Hs(H) = 0. (28.12) 


Then, because of H = H,(H) ®H,(H), He = Hac(H) ® Hs-(H), and the general 
relations shown above, one has indeed 


H,(H) = M,(4), Hac(H) = M;(H), H = M,(H) ® M(H). (28.13) 


While the verification of Part A) of (28.12) is relatively straightforward, the imple- 
mentation of Part B) is quite involved. Thus, for this part we just mention some basic 


28.3 Interpretation of the Spectrum of a Self-Adjoint Hamiltonian 433 


results and have to refer to the specialized literature on (mathematical scattering) for 
the proofs. 
The starting point for the proof of H,-(H) C M;(#) is the following lemma. 


Lemma 28.3 For all Ww € Hac(H) the time evolution U;W converges weakly to 0 
for |t| > o. 


Proof The strategy of the proof is to show with the help of the spectral theorem and 
the characterization of elements y in H,-(#) in terms of properties of the spectral 
measure dpy, that forevery @ € H the functiont +> (, U;w) is the Fourier transform 
of a function Fy, € L'(R) and then to apply the Riemann—Lebesgue Lemma (which 
states that the Fourier transform of a function in L!(R) is acontinuous function which 
vanishes at infinity, see Lemma 10.1). 

For arbitrary ¢ € H spectral calculus allows us to write 


(6.U,) = [ oF 4p, E,W) 


for all t € R. Let A Cc R be a Borel set. Then hx d(¢, Es) = (¢, E(A)y). Denote 
by P,, the orthogonal projector onto the subspace H.,,(/7). It is known to commute 
with E(A) and therefore we have (¢, E(A)w) = (6, E(A) PacW) = (Pach, E(A)W). 
Thus, the estimate |(¢, E(A)w)| < ||E(A) Pac] ||E(A) || follows. According to 
Proposition 28.2, yy € Ha-(H) is characterized by the fact that the spectral measure 
dpy(s) = d||Esw \|7 is absolutely continuous with respect to the Lebesgue measure 
on R, ie., there is a nonnegative function fy such that doy(s) = fy(s)ds. Since 
Se dey (s) = Iwi?, we find 0 < fy € L'(R). 

The estimate |(¢, E(A)w)| < ||E(A) Pac] || E(A)w|| implies that the measure 
d(¢, E;w) too is absolutely continuous with respect to the Lebesgue measure; hence, 
there is a function Fy,y on R such that d(¢, E,yw) = F,y-(s)ds. The above estimate 


also implies | Fy y(s)| < / fe,.¢(8) fy (s), thus Fg y € L'(R). We conclude that 


wuge [ oF Fy 4 (s)ds 


is the Fourier transform of an absolutely integrable function, and therefore is a 
continuous function, which vanishes for |t| — co. 


Lemma 28.4 Let E be the spectral family of the self-adjoint operator H and 
introduce the projector P, = E, — E_p. If all the operators 


ForPn, neéN, 0<R<& 


are compact in H. = L(R3), then Hac(H) © M,(A). 


Proof Suppose yy € Ha-(H) is given. Then, by the previous lemma, U;{ converges 
weakly to 0. Since Fy z P,, is assumed to be a compact operator, it maps this sequence 


434 28 Some Applications of the Spectral Representation 


onto a strongly convergent sequence, therefore 


lim For PiU || = 0. 
|t| oo 


Given € > 0 there is ann € N such that ||(J — P,,)W|| < ¢/2. This number n we use 
in the following estimate, for any 0 < R < oo: 


|ForUi Wl < WFSRU — Pa)U wll + For Patil 
SIU -— PW + FoR PiU ll < €/2+ For PiU vl. 


Now we see that w satisfies the characterization (28.10) of scattering states and we 
conclude. 
Certainly, it is practically impossible to verify the hypothesis of the last lemma 
directly. But this lemma can be used to arrive at the same conclusion under more 
concrete hypotheses. The following theorem gives a simple example for this. 


Theorem 28.6 Suppose that for the self-adjoint Hamiltonian H in H = L?(R°*) 
there are q € N and z € p(A) such that the operator 


Fix aly? 
is compact for every 0 < R < oo. Then Ha-(H) © M;(A) holds. 


Proof Write 
F RP, = Fy R(H ~ el) "(H ~ 21)? P, 


and observe that (H — z/)? P,, is a bounded operator (this can be seen by functional 
calculus). The product of a compact operator with a bounded operator is compact 
(Theorem 25.4). Thus, we can apply Lemma 28.4 and conclude. 

There are by now quite a number of results available, which give sufficient con- 
ditions on the Hamiltonian H, which ensure that the singular continuous subspace 
Hs (H) is empty. But the proof of these results is usually quite involved and is be- 
yond the scope of this introduction. A successful strategy is to use restrictions on H, 
which imply estimates for the range of its spectral projections, for instance 


ran E(a,b) C Ha-(A). 


A detailed exposition of this and related theories is given in the books [3, 5-7]. We 
mention without proof one of the earliest results in this direction. 


Theorem 28.7 For the Hamilton operator H = Ho + V in the Hilbert space 
H = L?(R*) assume 

a) Vil =f f SO dedy < (42) or 

b) lew" IV |p < 00 for some a > 0. 


Then the singular subspace of H is empty: Hs(H) = @, hence in particular Hs-(H) = 
%, and there are no eigenvalues. 

A more recent and fairly comprehensive discussion of the existence of bound 
states and on the number of bound states of Schrédinger operators is given in [8, 9]. 


28.5 Exercises 435 


28.4 Probabilistic Description of Commuting Observables 


Recall from our discussion of the spectral representation 


A= AdE), 
a(A) 


of a self-adjoint operator A. If A represents an observable a of a quantum mechanical 
system, which is in the state y then 


(Ayy = (Av) = 


o( 


Ad(y, Ex), 
A) 


is the expectation value of a in the state y, and thus (w, E,W) is the probability for 
a value of a in the state w which is smaller than or equal to 2. And the observable a 
has a value 4 with probability one if y is an eigenvector of A for the eigenvalue 1. 

Given a state y, an observable a may be regarded as a random variable on the 
probability space (2 = R, B, P} (di)) where 6 denotes the Borel o-algebra on R 
and Pi (dA) = (w,dE,y). Similarly, any number of (strongly) commuting self- 
adjoint operators can be regarded as random variables on an appropriate probability 
space. But not all observables of a quantum system are commuting and a famous 
theorem of von Neumann [10] shows that the set of all observables of a quantum 
system in a given state y cannot be regarded as a family of random variables on a 
common probability space (for a short proof see [11]). 


28.5 Exercises 


1. Prove Part (c) of Theorem 28.1. 
Hints: Given f,g € C,(R) and x,y € H show first that (x, g(A)f(A)y) = 
(g(A)x, f(A)y) = limy+o0(8n(A)x, fr(A)y) with continuous functions fy, 7 
with support in [—n,n]. Then prove 


(BilA)x, fn(A)y) = lim (2G Z)x, Btu ZY) 


where the approximations »'(f,, Z) are defined in Eq. (27.20). Z is a partition of 
the interval [— n,n]. Then use orthogonality of different projectors E(t;_1,t;] to 
show 


(2 (Sn, Z)x, 2 fn» Z)y) = (x, Y(8n + fn, Z)y-) 


2. Prove the spectral mapping theorem, Part (4) of Theorem 28.1. 
Hints: For z ¢ o(g(A)) the resolvent has the representation R,4)(z) = 
Se wpe. 

3. Denote by a the countable set of all points t; at which the spectral family E has 
a jump. Show: E({t})E(ax =0 VteR. 


436 28 Some Applications of the Spectral Representation 


4. Let A bea self-adjoint operator in the complex Hilbert space H. Show that H,.(H) 
is a closed linear subspace of H.. 

5. Let A bea self-adjoint operator in the complex Hilbert space H. and Ho a reducing 
subspace. Prove that the restriction Ag of A to this subspace is a self-adjoint 
operator in Ho. 

6. Complete the proof of Theorem 28.3. 

7. Prove Lemma 28.1. 

8. For a self-adjoint Hamiltonian H in the Hilbert space L7(R*) prove that the set 
of all scattering states is a closed linear subspace. 

9. Let E be the spectral family of a self-adjoint operator H in the complex Hilbert 
space H. Prove Stone’s formula: 


b 
5 (x, [Ela,b] + E(a,b)|x) = lim a. Im (x,(H —(t + ir)1)~'x)dt 


for allx € Handall—co<a<b<wow. 
Hints: Prove first that the functions g,, r > 0, defined by 


« 1 f 1 1 3 
r = ae ‘ 5 S 
2 2in Jy \s-t-— ir s—t+ir 


have the following properties: This family is uniformly bounded and 


0 if t ¢ [a,b], 
lim a0) = 1/2 ifte {a,b}, 


r>0,r> 
1 ifte(a,b). 


References 


Halmos PR. What does the spectral theorem say? Am Math Mon. 1963;70:246-7. 

. Rudin W. Principles of mathematical analysis. Weinheim: Physik-Verlag; 1980. 

3. Reed M, Simon B. Scattering theory. Methods of modern mathematical physics. Vol. 3. New 
York: Academic Press; 1979. 

4. Baumgaertel H, Wollenberg M. Mathematical scattering theory. Basel: Birkhauser; 1983. 

Reed M, Simon B. Analysis of operators. Methods of modern mathematical physics. Vol. 4. 

New York: Academic Press; 1978. 

6. Amrein WO, Sinha KB. Scattering theory in quantum mechanics : physical principles and 
mathematical methods. Lecture notes and supplements in physics. Vol. 16. Reading: Benjamin; 
1977. 

7. Pearson DB. Quantum scattering and spectral theory. Techniques of physics. Vol. 9. London: 
Academic Press; 1988. 

8. Blanchard Ph, Stubbe J. Bound states for Schrédinger Hamiltonians: phase space methods and 
applications. Rev Math Phys. 1996;8:503-47. 

9. Grosse H, Martin A. Particle physics and the Schrédinger equation. Cambridge monographs 

on particle physics, nuclear physics and cosmology. Vol. 6. Cambridge: Cambridge University 

Press; 1997. 


aS 


on 


References 437 


10. von Neumann J. Mathematical foundations of quantum mechanics. Investigations in physics. 
2nd print, editor. vol. 2. Princeton: Princeton University Press; 1967. 

11. Blanchard Ph, Combe P, Zheng W. Mathematical and physical aspects of stochastic mechanics. 
Lecture notes in physics. Vol. 281. Berlin: Springer; 1987. 


Chapter 29 
Spectral Analysis in Rigged Hilbert Spaces 


29.1 Rigged Hilbert Spaces 


29.1.1 Motivation for the Use of Generalized Eigenfunctions 


In Chaps. 24—27 we have presented various details of the spectral analysis of bounded 
and unbounded linear operators in a separable Hilbert space #1. According to the 
Spectral Theorem of Chap. 27 every self-adjoint operator A has a unique spectral 
representation 


Ax = [ raz,x 
R 


for allx € H for which Te 17d || E,x||° is finite with a unique spectral family E, = E " : 
t € R. And in Chapts. 24 and 28 we investigated the various parts of the spectrum 
o(A) of a self-adjoint operator A. As a first splitting of the spectrum we found one 
into two parts, namely into the point spectrum o,,(A) and the continuous spectrum 
o-(A). In concrete application it is often important to have a complete system of 
eigenfunctions as in the finite dimensional case, hence in particular eigenfunctions 
for all points in the spectrum and we saw that for 2. € o,(A) there is an eigenfunction 
Ww, € D(A) C H such that AW, = AW, but not for points in the continuous 
spectrum. 

In Physics there is an elegant and much used formalism to use eigenfunctions as- 
sociated with all points in the spectrum, namely the famous bra and ket formalism 
of Dirac. This formalism associates eigenfunctions also to points which do not be- 
long to the point spectrum and thus have no proper interpretation in the Hilbert space 
context. But these “improper” eigenfunctions have been used with great success for 
quite some time. This indicates that the purely Hilbert space framework for spectral 
analysis might be too narrow. Recall: An important réle of (rigorous) mathematics 
in physics is to make good sense of heuristic ideas, i.e., to help to find appropriate 
settings in which to formulate these ideas in precise terms and to establish their con- 
sistency. Two famous examples of mathematicians who have created paradigmatic 


© Springer International Publishing Switzerland 2015 439 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_29 


440 29 Spectral Analysis in Rigged Hilbert Spaces 


examples for this statement are L. Schwartz and his treatment of Dirac’s delta func- 
tion and J. von Neumann and his introduction of operator algebras in connection 
with the (mathematical) foundations of quantum physics. 

A simple concrete example can guide us how to extend the Hilbert space frame- 
work. Consider the momentum operator P = if in 1 = L?(R) the Sobolev space 
H'(R) = {f € LR): pf(p) € L?(R)} as its natural domain: D(P) = H'(R) 
where f denotes the Fourier transform of f. It is known that P is self-adjoint on 
this domain. Note also that P maps the test function space S(R) continuously into 
itself and that S(R) is densely and continuously embedded into the Hilbert space 
H = L?(R). The operator P is unitarily equivalent to the operator of multiplication 
with the variable M;4 via Fourier transformation F> on L?(R), i.e., Fo PFy —_ id 
and hence its spectrum is the real line o(P) = R (see Part 2 of Example 24.2.1). 
But for nog € R there is w, € D(P) such that Py,(x) = + og (x) = qWq(x). 
Certainly, this simple differential equation has the solutions e,(x) = ce *4* with 
constants c. This system of eigenfunctions is complete in L7(R) in the sense that 
every f € L?(R) has a representation in terms of these eigenfunctions, namely in 
terms of the Fourier transform. 

So one should enlarge the Hilbert space L*(R) so that this enlargement can ac- 
commodate these solutions. This is done by introducing a suitable rigged Hilbert 
space. These eigenfunctions e, can be considered as regular distributions F, = [, 
in the space S’(R) of tempered distributions. Then one has for all ¢ € S(R) 


PF,($) = Fq(P'$) = Fy(— Pd) = 4F,(¢) 


where P’ is the adjoint of P in the sense of Sect. 8.3. Thus F, is an eigenfunction of 
P in the sense of tempered distributions. 


q 


29.1.2 Rigged Hilbert Spaces 


The following definition of a rigged Hilbert space or Gelfand triple is modelled on 
the above example. In this definition one important property enters which we have 
not yet mentioned for the test function space S(R) and not for the other test function 
spaces which were discussed in the first part of this volume, namely that these spaces 
are nuclear, but nuclearity is essential in the definition and successful application of 
a Gelfand triple. Accordingly we start with presenting basic facts of nuclear spaces. 

Consider a Hausdorff locally convex topological vector space ® (in the sense 
of Chap. 2 of Part I) whose filtering system of seminorms is countable and each 
seminorm is actually a norm which is defined by a scalar product h, on ®,n € N. 
Without loss of generality we can assume that these scalar products are ordered in 
the sense that 


Af, f) < Muri, f) forall fe®neN. 


Then the completion of ® with respect to the norm ||-||,, = //,(-, -) 1s a Hilbert space 
@,, (see Appendix A) whose scalar product is still denoted by h, and for n > m one 


29.1 Rigged Hilbert Spaces 441 


has continuous dense embeddings 
®,, = Dy 


denoted by T,,,. Such a space © is called nuclear! if, and only if, for any m ¢ N 
there isn > m such that the embedding map 7,,,, is of trace class or nuclear, i.e. is 
of the form 


Tnn(f) = )> dun (ers f) Sk (29.1) 


k=1 


where (e;) respectively (fj) are orthonormal sequences in ®, respectively in ®,, and 
where the series ee A, converges and A, > 0. 

Note that f t+ h,(e;, f) defines a continuous linear functional F, : &, — C, 
i.e., an element in the dual space / of the Hilbert space ©, and thus ©’ is itself a 
Hilbert space, often denoted by ®_,: &) = _,,, with norm 


Fill-n = sup {IFAs f € Ga lf ln = LP = 1 


Next consider a scalar product hp on @ such that ho(f, f) < mi(f, f) forall f € @. 
Define a Hilbert space as the completion of © with respect to the norm ||-||) defined 
by this scalar product. Thus we have the chain of continuous and dense embeddings 


Pri, 2 BO... OO CH NEN. 


Note that by construction 
P= OneNnPn. 
Since @ is continuously embedded into H, every continuous linear functional on H. 
is also a continuous linear functional on 9, i.e., 
H — ©’. 


If T denotes the embedding of @ into H, its adjoint T’ gives the embedding of the 
dual spaces. By Corollary 15.3.1 the Hilbert space H and its dual H’ are isometrically 
anti-isomorphic. Here it is convenient to identify H and H’. Then the embedding 
T’ :H — @®' has to be an antilinear map and we arrive at the chain of embeddings 


POH (29.2) 


where the second embedding is antilinear. 


Definition 29.1 A triple of spaces ®,H, &’ where @ is a nuclear space and H a 
Hilbert for which the embedding relation (29.2) holds, is called a rigged Hilbert 
space or a Gelfand triple. 


' nuclear spaces were introduced in [1] and are studied in detail in many modern books on functional 
analysis. 


442 29 Spectral Analysis in Rigged Hilbert Spaces 


29.1.3 Examples of Nuclear Spaces 


Many of the (function) spaces used in (functional) analysis are actually nuclear. Here 
we just illustrate this important concept with some basic examples. For details we 
have to refer to specialized literature: [2-5]. 


29.1.3.1 The Sequence Space G 


Denote by G = G(C) the space of all sequences c = (c,) of complex numbers c, 
such that for every s € N 
sup n*|cy| < 00. 


On G define a sequence of scalar products h, by 


CO 
bWiGgey=) ste, 


j=l 


Obviously these scalar products satisfy h,(c,c) < hy+i(c,c) for alln = 1,2,.... 
Denote by G,, the Hilbert space obtained by completion of G with respect to the 
scalar product h,. This give the chain of embeddings 


Gnt1 > Gn +++ GO = LC) 


and 
6=N,G,. 


An orthonormal basis of G, is eo = (bjej- 2), k = 1,2,.... Now givenm € N 
choose n > m + 2. We claim that the embedding T,,,, of SG, into G,, is a Hilbert— 
Schmidt operator. To prove this calculate 


CO 2 CO CO 
Yel [2 = aye = 
k=1 k=1 j=l 


hence T,,, is a Hilbert—Schmidt operator for n > m + 2. Since the product of two 
Hilbert-Schmidt operators is nuclear, nuclearity of the sequence space G follows. 


29.1.3.2 The Test Function Space S(R) 


Recall that we had introduced this test function space as the space of strongly 
decreasing C™ functions on R with the filtering system of norms 


Pma(f)= sup (1+x7)"|D* f(x)|, mk EN. 


xéER,|a|<k 


om the eS aoage operator of the quantum harmonic oscillator (see Sect. 16.3) 
3-4 > +x?) and define H = 2A). Clearly this operator maps S(R) into 


29.1 Rigged Hilbert Spaces 443 


itself and is bounded from below: H > J. Now define a sequence of scalar products 
on S(R) by 
hia(f,g) = (f, 0") f.g € S(R) 


where (-, -) denotes the scalar product of L?(R). It requires some lengthy calculations 
to show explicitly that the filtering system of norms defined by these scalar products 
is equivalent to the original filtering system of the p,,,, though intuitively this is 
quite obvious. 

Since H > I it follows easily that hn(f, f) < hn+iCf, f) for all f € SCR) and 
all n. The completion of S(R) with respect to h, yields a Hilbert space S,(IR) and 
the chain of embeddings 


Sn+i(R) > S,(R) > +++ So(R) = L?R). 


According to Sect. 16.3 the Hermite functions w; are the eigenfunctions of Ho for 
the eigenvalue j + 1/2, j = 0,1,2,.... They form an orthonormal basis of L7(R) 
and are elements of S(R). It follows that the sequence of functions 


ee as rae oa ee 


are an orthonormal basis of the Hilbert space S,(R). For given m € N choose 
n >m-+ 2 and calculate 


oe) 
[zane 
j=0 


Since n > m + 2 this series converges and thus 7,,,, is a Hilbert-Schmidt operator 
S, (UR) —> S,,(R) and we conclude that S(R) is nuclear. 


CO CO CO 
Yi bimeps ey) = > (Hib; HH p;) = D> (27+). 


j=0 j=0 j=0 


2 


m 


29.1.4 Structure of the Natural Embedding in a Gelfand Triple 


The assumption of nuclearity of the space @ in the definition of a rigged Hilbert 
space © — H ~ @' has a very important implication for the structure of the 
natural embedding T : ® <> H which we investigate now. 


Theorem 29.1 Suppose that ~ H — ©' is a rigged Hilbert space and denote 
by T the natural embedding of ® into H. Then there is n € N and there are an 


orthonormal basis { f¢} of H, an orthonormal basis { F,} of ®),, and there are numbers 
hg = O with Y°, A < 00 such that for every 6 € ® 
CO 
T() = Do Ac Fu() fe - (29.3) 


k=1 


Proof According to our definition of a Gelfand triple the space ®,, is the completion 
of ® with respect to the norm defined by the inner product h,, in particular yp = H. 


444 29 Spectral Analysis in Rigged Hilbert Spaces 


Since ® is assumed to be nuclear there is n € N such that the embedding 7p, : 
®, — Po is nuclear, i.e., of the form (29.1): 


Ton(P) = Yo Achn(ex, ) fi 


k=1 


where {e;,} is an orthonormal basis of ®, and {f;,} an orthonormal basis of ®o 
and where the nonnegative numbers A, satify }°, A, < oo. As remarked before 
F,.(@) = h, (ex, @) defines a continuous linear functional on the Hilbert space @, and 
by construction the set { F,} is an orthonormal basis of &/. Since for every ¢ € ® 
one has T(¢@) = To,() the representation formula (29.3) follows. 


Theorem 29.2 Suppose that in a rigged Hilbert space B ~ H — @' the Hilbert 
space has a realization as a space of square integrable functions with respect to some 
Borel measure {4 on some set X, i.e., there is a unitary map U :H —> L?(X, LL). 
Then there isa map F : X —> ®', x +> F, such that for all @ € ® one has for 
p-almost all x € X 


Fy (b) = (U@)(x). (29.4) 


Proof Formula 29.3 implies that the embedding UT : 6 —> L*(X, ) is of the 
form 


UT(p) = Dae Fe(Pyhe. (29.5) 


k=1 


where {hy = Uf;} is an orthonormal basis of L?(X, 2). 
Statement 1: For j-almost all x € X the series 


Fe = DU Ache) Fe (29.6) 
k=1 


converges in ®’. 
Since @/ is complete it suffices to show that the sequence of partial sums S; (x) 
is a Cauchy sequence in ® for jz-almost all x € X. For M > L we estimate 


M 


Sux) — Sux), < DS Aglae(edI I Fell - 
k=L+1 


Thus, since || F;||/, = 1, it suffices to show that the numerical series 
Yo Aele(x)| 
k 


converges for j-almost all x € X. Recall that {h;} is an orthonormal sequence in 
L?(X, jt), therefore we know 


i (x iho?) dus) = ode ff Ura du(x) = Ae < 00. 
XN k x k 


29.2 Spectral Analysis of Self-adjoint Operators and Generalized Eigenfunctions 445 


hence the series )°, Ax |hy(x)|? of positive terms converges for jz-almost all x € X. 
The Cauchy—Schwarz inequality implies 


2 
(x sth < Do de Do delee(x)? 
k k k 


which shows that the series }°, Ax|x(x)| converges for j-almost all x € X. This 
proves Claim 1. 

Statement 2: Statement (29.4) holds. 

As a consequence of Statement 1 we know for every @ € @ and for jz-almost all 
xex 


F.(¢) = Yo Ache (x) Fe() 


k=1 


and, since the h, are an orthonormal system in L?(X, 2), it follows F.(b) € L7(X, 12) 
and 


i [F(@) due) = DOA). 
k 


Since T is an embedding we know U¢@ = UT ¢ and thus by (29.5), for w-almost all 
xex 


(Udy(x) = Yo An Fe(Pyhe(e). 


k=1 


Hence F;.(¢) and (U¢)(x) agree as elements of fia? # 2) and thus (29.4) follows. 


29.2 Spectral Analysis of Self-adjoint Operators 
and Generalized Eigenfunctions 


29.2.1 Direct Integral of Hilbert Spaces 


Recall the construction of direct sums of (a countable family of) separable Hilbert 
spaces in Chap. 18. For the general eigenfunction expansion for self-adjoint operators 
we need a continuous analogue, called the direct integral of (a continuous family of) 
Hilbert spaces. For this construction some basic measure theory is needed. 

For a topological space A the Borel o-algebra &’ = %'(A) is the family of all 
subsets of A generated by countable unions and intersections, and relative comple- 
ments of open sets. Any measure jz on » is called a Borel measure and (A, ju) a 
Borel space. A Borel measure on A is o-finite if A has a countable decomposition 
into measurable sets of finite measure. 

For further details and proofs of the following definitions and results we have to 
refer to Sect. IV.8 of [6]. A slightly different approach is presented in [2], Sect. 1.4. 


446 29 Spectral Analysis in Rigged Hilbert Spaces 


Definition 29.2 Suppose that a Borel space A with a o-finite Borel measure ju is 
given. A measurable field of Hilbert spaces on (A, jz) is a family {H(A) : A € A} 
of Hilbert spaces indexed by A together with a subspace M of the product vector 
space [[,., H(A) with the following specifications: 


(a) For any x € M the numerical function on A, A b> ||x(A)|| is w-measurable 
where ||-|| is the norm on H(A). 

(b) If for any y € Pliea H(A) the numerical function A b (y(A),x(A)) is w- 
measurable for every x € M, then y belongs to M. Naturally, (-,-) denotes 
here the scalar product of the Hilbert space H(A). 

(c) Existence of a fundamental sequence of js-measurable vector fields: There ex- 
ists a countable subset {x, : € N} of M such that for every A € A the set 
{x,(A) : n € N} is total in the Hilbert space H(A). 


The elements of M are called jz-measurable vector fields. 


Lemma 29.1 Suppose that {H(A) : A € A} is a measurable field of Hilbert spaces 
on the Borel space (A, 1). Then the function ++ N(A) = dim H(A) is measurable 
on A and there exists a fundamental sequence {x,} of measurable vector fields such 
that 


(a) {x1(A), at xXNanlA)} is an orthonormal basis of H(A) for every X € A, 
(b) if N(A) is finite, then xn@j4e(A) = O for allk = 1,2,.... 


Assume now that {H(A) : A € A} is a measurable field of Hilbert spaces on the Borel 
space (A, iz) with a positive o-finite measure 4. Denote by H the family of all 
measurable vector fields x such that 


1/2 
Ix|| = ( i; is 4409) Bes (29.7) 


where naturally the norm of the integrand is the norm of the Hilbert space H(A). As 
usual in L? spaces addition and scalar multiplication are defined pointwise; thus H 
becomes a vector space on which a scalar product is naturally defined by 


(x,y) = [0 yA))dur), XY €H. (29.8) 


Clearly the scalar product of the integrand is that of (A). If we identify two vector 
fields x and y whenever x(A) = y(A) ju-almost everywhere then one shows as for 
standard L* spaces that 1. is complete with respect to the scalar product (29.8) ; thus 
H. is a Hilbert space. 


Definition 29.3 The Hilbert space H constructed above is called the direct integral 
of the measurable field of Hilbert spaces H(A) and written as 


@ 
H=Hyw = / H(A)d (A) 
A. 


29.2 Spectral Analysis of Self-adjoint Operators and Generalized Eigenfunctions 447 


and accordingly its element x € H are written as 


e 
x= | x(A)du(A). 
A 


29.2.2 Classical Versions of Spectral Representation 


Recall the Spectral Theorem 27.5 for a self-adjoint operator A in a separable Hilbert 
space #1. Then, given any Borel function f : o(A) —> C the operator f(A) is 
defined with the help of Theorem 27.4 by 


f(A) = i f(a; on DFA) = {x eH: / If) Pd IIExI < 2] | 
a(A) o(A) 
(29.9) 


This classical version can be used to prove the functional version of the spectral 
theorem [7, 8] which we recall without proof. 


Theorem 29.3 (Spectral Theorem—Functional Form) Given a self-adjoint oper- 
ator A on a separable Hilbert space H there exist a nonnegative Borel measure |L 
on the spectrum o(A) of A, a direct integral of separable Hilbert spaces H,, 


8 
Wig= / Hydud), N(A)= dim (Hy), 
o(A) 


and a unitary operator U : H—> Hn such that the operator UAU~! is diagonal 
on Hn, i.e. 


UAU"! = / 7 ALdua), L, = idn,. (29.10) 
Borel functions f(A) of A have the now the convenient representation 
Uf(A)U"! = / yf 8HO (29.11) 
i.e., for all x € H and all y € D(f(A)) one has 
(x, f(A)y)a = (Ux, Uf(AU UY) Hy = ie SANK X)A), Uy)A)) 74, du), 


where the scalar products in the various Hilbert spaces are indicated explicitly. 


Remark 29.1 If a self-adjoint operator A has a cyclic vector, i.e., a vector x) € H 
such {E(A)xo : A C o(A), interval} is dense in H? then Theorem 29.3 has a much 


? in Proposition 5.20 of [9] it is shown that this condition is equivalent to the existence of a vector 
Xo9 € M, D(A") such that the linear span of {A”xo :n = 0, 1,2,3,...} is dense in H. 


448 29 Spectral Analysis in Rigged Hilbert Spaces 


simpler form and the data in this theorem can be constructed easily. Define the 
measure jz by 
H(A) = ||E(A)xo|l? = (x0, E(A)xo) 


and U : H —> L?() by continuous linear extension of the isometric mapping 
E(A)xo Xa where x, is the characteristic function of the interval A C o(A): 


| E(A)xoll? = u(A) = / Ixa(A)/?du(A). 


A similar calculation shows that for pairwise disjoint sets A; and complex numbers 
aj; tespectively sets A’ and complex numbers b; one has 


(Sata Seta = / Yj x4,) D> bixa:(Addua) 
fol j=l i=l 


j=l 


so that U is actually defined by continuous linear extension of the mapping Uo, 


n 
Uo Y aj E(Aj)x0 = Y ae 


j=l j=l 


Then, by inserting the spectral representation of A, we find for any interval A 
(x0, AE(A)x0) 4. = / AxalA)du(A). 


Since arbitrary y € H are the limit of sums of the form }°""_, a; E(A;)xo with a; € C 
and intervals A; C o(A) this formula can be extended to the following 


(y, Arla = / TMU x\(A)dpe(2) 


for all y € H and x € D(A) which shows 
UAU™! = XT, I = 1d} 2(6(A),u)« 


Thus in this case one has H,, = C for all A € o(A) and the direct integral of Hilbert 
spaces in Theorem 29.3 reduces to a simple Lebesque space L?(w) = L?(o(A), di) 
over the spectrum with a measure defined in terms of the spectral family of the 
self-adjoint operator A. 


Remark 29.2 In general a self-adjoint operator A on a separable Hilbert space 
H. has no cyclic vector. Then a much more involved construction is needed since 
now substantial parts of spectral multiplicity theory for self-adjoint operators are 
required. This naturally is beyond the scope of our brief discussion. A comprehensive 
exposition of this theory can for instance be found in [8]. This general case relies on 
the use of a direct integral of Hilbert spaces. Here we indicate the first steps. At first 
the given separable Hilbert space H is decomposed into spectral subspaces: 


29.2 Spectral Analysis of Self-adjoint Operators and Generalized Eigenfunctions 449 


Choose a unit vector x; € H and define a Borel measure jz; as above by j4;(A) = 
|| E(A)x;||? and a Hilbert space 71; as the closed linear subspace of H generated by 
{E(A)x; : A C o(A), interval}. Again there is a unitary map U; :H,; — L?(11). 
Next select a unit vector x2 € Ht and do the same construction with x2 replacing x; 
to get a Hilbert subspace 12, a Borel measure jz2 and a unitary map U2 : H.2 —> 
L?(12). By repeating this construction we arrive at the decomposition 


N N 
= Bu =@ruy 
j=l j=l 


with N € Nor N = ov, since H is separable. 
Then one has to define a suitable direct integral of Hilbert spaces H,,,y and has 
to construct a unitary map 


N 
U: @DLV(uj) > Hun- 
j=l 
Here many (technical) details, mainly about measures and spectral measures, are 
involved for which we have to refer to [8]. 


29.2.3. Generalized Eigenfunctions 


Suppose we are given a Gelfand triple ® — H <> @’ and a linear operator A 
on ® which maps © continuously into itself and whose closure A is a self-adjoint 
operator in H. An element F, € ©’ is called a generalized eigenfunction of A 
corresponding to the (generalized) eigenvalue 4 € R if, and only if, for alld € ® 


F (Ad) = AF, (9). (29.12) 


Since A maps @ continuously into itself its dual is a well defined map A’ : 6’ —> @’ 
and the defining equation for generalized eigenfunctions can be written as an equation 
in @’: 

A'F, = AF. 
Note that for 7 € o,(A) the eigenfunction yy, € D(A) can also be considered as a 


generalized eigenfunction y, via the (antilinear) embedding of H into ®’: For all 
d € @ one has 


Wr(A@) = (Wn, Ad) = (Aya, 6) = (Ata, 6) = At). 


For given 4 € o(A) denote by ©; = @)(A) = {F Ee @’:A'F= rF} the space of 
corresponding generalized eigenvectors and then for ¢ € ® define @ : ®—C 


450 29 Spectral Analysis in Rigged Hilbert Spaces 
by ¢,(F) = F(@). The assignment 
bro, AE€0(A) 


is called the spectral decomposition of ¢ corresponding to the operator A. Naturally 
the spectral decomposition of yw = Ad is Wy, = Ady since for all F € P 


W(F) = F(Ad) = AF (¢) = A, (F). 


Definition 29.4 As above let A be a self-adjoint operator in the Gelfand triple 
®— H<— ®". The set of generalized eigenvectors of A is called complete if, and 
only if, 

b, =0 => =0 


forge @. 


Example 29.1 Consider again the operator A = P = ie in the Gelfand triple 
> Ho O'withH = L?(R) and @ = S(R). We knowa(P) = Randforg € R 
the generalized eigenvector Fy € &’ for P is the regular distribution /,, defined by 
the exponential e,(x) = e é xq, , since, because of the antilinear embedding of H into 
®', one has 


1 : 
Fy (P9) aye as * p0ode = 7 afe 1? o(x)dx = qFy(9) 


for all @ € @®. The corresponding spectral decomposition of ¢ € @ is given by 
ba: q € R where bg (Fy ) = Fj?) = (21)? 6(q) is proportional to the Fourier 
transform of @. And we know that the Fourier transform is an isomorphism on S(R). 
Thus the system of generalized eigenvectors of P is complete. 


29.2.4 Completeness of Generalized Eigenfunctions 


In order to proceed we need the counter part of Theorem 29.2 where the space 
L?(X, 1) is replaced by a direct integral H,,,v of Hilbert spaces H,, 4 € A, Definition 
29.3. Naturally this case is more involved. 


Theorem 29.4 Suppose that in a rigged Hilbert space B ~ H — ©’ the Hilbert 
space has a realization as a direct integral Hn of Hilbert spaces H), with respect 
to a positive Borel measure jz on A, i.e., there is a unitary map 


® 
U:H— Hyun = i Hyd(A). 
A 


Then there existn € N and a map F which assigns to every X € A a continuous 
linear map F,,: ®, —>» H,, such that for all @ € ® one has for jx-almost alli. € A 


Fi (@) = (U@)A) € Hy. (29.13) 


29.2 Spectral Analysis of Self-adjoint Operators and Generalized Eigenfunctions 451 


Proof According to Theorem 29.1 there is n € N such that the natural embedding 
T : ® —> His of the form (29.3). Hence the embedding of ® into H.,,,n is of the 
form 


UTS) =) acFi(O)he GE ®, (29.14) 


k=1 


where {h; = Uf;} is an orthonormal basis of H.,,,7. It follows that 


UT O34, = el Fe@)I < 00 


k=1 


since | Fy (@)| < ||@l|,, for all k and }°, A, < oo. As an orthonormal system in H.,,,v 
the elements hx satisfy i A ne (ADII,, dy(A) = 1 and therefore, because the series 
contains only nonnegative terms, 


(oe) 


Soke = Dea Ae My Nee I5,, du) 


k=1 
= fx Dre Ak WADI, AHO) 


and we deduce that the series 


CO 
Perales) en 


k=1 


converges for jz-almost all 7. € A. The Cauchy—Schwarz inequality implies that the 
series 


CO 
So de Welle, 
k=1 


converges for jz-almost all A € A. Since | Fy(@)| < ||@|l,, for all k, it follows that for 
every @ € @ the series 


So de Fe )hK(A) (29.15) 


k=1 


converges in H, for z-almost all 4 € A and defines an element F,(@) € H, such 
that 


Flay, < do re Welle, Illa 
k=1 


Since T is an embedding we know ¢ = T¢ and thus U¢ = UT ¢ and therefore by 
(29.14) U¢(A) and F;,(@) have the same series representation (29.15), hence (29.13) 
follows. 


452 29 Spectral Analysis in Rigged Hilbert Spaces 


Theorem 29.5 (Existence and Completeness of Generalized Eigenfunctions) 
Suppose that A is a self-adjoint operator in a separable Hilbert space H for which 
a rigging P > H — ©’ exists such that ® C D(A) and such that A: B —> ® 
is continuous. Then A has a complete set of generalized eigenfunctions in ®'. 


Proof Apply Theorem 29.4 in the case where the direct integral of Hilbert spaces is 
determined by the functional form of the spectral theorem (Theorem 29.3), i.e., 


8 
Pig 1 Hdl). 
o(A) 


Thus there is n € N and a map F on o(A) with values F, which are continuous 
linear mappings ®, —> H, such that for all @ € @ and for j-almost all A € o(A) 
Relation (29.13) holds. Taking (29.10) into account we find for every 6 € ® and 
j-almost all 4 € o(A) 


F,(Ag) = U(A$)(A) = (UAU')(UG)(A) = AU P)A) = AFM). 


Since A : © —> @ is continuous its adjoint A’ is well defined on ®’ by duality 
and for jz-almost all A € o(A) one has A’ Fy, = AF, hence the F,, are H,-valued 
generalized eigenfunction of A and for x € H, the inner products (x, Fi.) € ®) are 
generalized eigenfunctions of A. 
Let 
$t> bi, KE a(A) 


be the spectral decomposition of ¢ € ® with respect to the operator A and suppose 
od, = 0 for u-almost all A € o(A). By definition this means Ug(A) = F,(¢) = 0 
for pz-almost all A € o(A) and thus 


Ile = IU Pllae.y = i UPAD3,, du(A) = 0, 


o(A) 


hence ¢ = 0 and our system of generalized eigenfunctions of A is complete. 


Remark 29.3 Since in Theorem 29.5 nuclearity plays a prominent rdle, it is often 
called the nuclear spectral theorem. There are several approaches to prove the 
existence and completeness of generalized eigenfunctions for self-adjoint operators. 
A fairly comprehensive list of references for these approaches and for applications 
in Physics can be found for instance in [10]. Our exposition is based on early results 
by Gelfand [2]. 


Remark 29.4 It is a simple exercise to show that Theorem 29.5 applies to the mo- 
mentum operator P considered in Example 29.1 and thus proves the existence and 
completeness of the system of generalized eigenfunctions for P which had been 
shown by elementary means in this case. 

If we review the proof of Theorem 29.5 we see easily that the assumptions could 
be weakened. Instead of assuming that the given Hilbert space H is rigged by a 


References 453 


nuclear space ® it is enough to assume that there is some Hilbert space ®, which 
is densely embedded into by a nuclear map and on which the given self-adjoint 
operator A acts continuously. 

Such a version is important for instance if the given self-adjoint operator A is a 
Schrédinger operator H = an + V(Q) in H = L?(R?): If we would work in the 
rigged Hilbert space S(R*) — L?(R*) — S’(R°), the condition that H maps S(R*) 
continuously into itself, requires that the potential is a multiplicator for S(R°), i-e., 
a C® function with all derivatives polynomially bounded. But often one has to deal 
with potentials V with much less regularity. If one can find a Sobolev-type space @,, 
on which H acts continuously and which is embedded into L?(R*) by a nuclear map 
then one could work under much weaker assumptions on V. 


29.3. Exercises 


1. Show that the system of Hilbertian norms defined by the inner products h, on 
S(R) in Sect. 29.1.3.2 is equivalent to the original system of norms on this space. 

2. Complete the argument in Remark 29.2 and thus show Theorem 29.3. 
Hint: Consult for instance the Appendix to Chap. I of [2]. 

3. Prove that the position operator Q in L7(IR) has a complete set of generalized 
eigenfunctions. 


References 


1. Grothendieck A. Produits tensoriels topologiques et espaces nucléaires. Memoirs AMS. 
1955516. 

2. Gel’fand IM, Vilenkin NYa. Generalized functions Vol. 4: applications of harmonic analysis. 
New York: Academic; 1964. (trans: A Feinstein). 

3. Pietsch A. Nuclear locally convex spaces. Ergebnisse der Mathematik und ihrer Grenzgebiete. 
vol. 66. Berlin: Springer-Verlag; 1965. 

4. Schaefer HH. Topological vector spaces. GTM. vol. 3. New York: Springer-Verlag; 1971. 

5. Robertson AP, Robertson WJ. Topological vector spaces. Cambridge: Cambridge University 
Press; 1973. 

6. Takesaki M. Theory of operator algebras I. Encyclopedia of mathematical sciences—operator 
algebras and non-commutative geometry. vol. 124. Berlin: Springer; 2002. 

7. von Neumann J. On rings of operators. Reduction theory. Ann Math. 1949;50:401-485. 

8. Birman MS, Solomjak MZ. Spectral Theory of self-adjoint operators in Hilbert space. Boston: 
Reidel; 1987. 

9. Schmiidgen K. Unbounded self-adjoint operators on Hilbert space. Graduate texts in 
mathematics. vol. 265. Dordrecht: Springer; 2012. 

10. Gadella M, Gémez F. A measure-theoretic approach to the nuclear and inductive spectral 

theorems. Bull Sci Math. 2005;129:567-590. 


Chapter 30 
Operator Algebras and Positive Mappings 


30.1 Representations of C*-Algebras 


In quantum mechanics, in local quantum field theory (Haag), in the functional ap- 
proach to relativistic quantum field theory (Garding—Wightman), and in quantum 
information theory positive functionals and completely positive mappings play a fun- 
damental role, mainly in connection with the mathematical description of “states” 
of quantum systems and their manipulation. For further details about the physical 
background we recommend [1, 2]. 

In this chapter, we present the most important structural results for positive lin- 
ear functionals and completely positive maps, namely the Gelfand—Naimark—Segal 
representation for positive linear functionals (on C*-algebras) and the Stinespring 
factorization theorem. The natural mathematical framework for these results is the 
theory of abstract C*-algebras and we formulate these results in this framework, 
but in our proofs we consider only the cases of C*-algebras of operators on Hilbert 
spaces. This allows to use some simplification in the characterization of positive el- 
ements in these algebras. For the general case we refer to the literature, for instance 
[3]. In this and in the following chapter elements of a (general or abstract) algebra 
are denoted by small Latin letters. 


Definition 30.1 Let A be an algebra over the field C. If A admits an involution * 
which is compatible with the algebraic structure of A, i.e., a mapping a b> a* such 
that for all a,b € A the following holds: 
(a*)* =a, (Aay* =ira*, AEC 
(a+b) =a*+b* (ab)* = b*a* 
Ais called an involutive algebra or a *-algebra. If A admits a norm ||-|| under which 
A is a Banach space such that 


llab|| < llal| 5) forall a,be A 


then A is called a Banach algebra. If a Banach algebra has an involution * for 
which the norm satifies ||a* || = ||a|| for alla € A such that A is *-algebra then A is 
called an involutive Banach algebra or a Banach *-algebra. 


© Springer International Publishing Switzerland 2015 455 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_30 


456 30 Operator Algebras and Positive Mappings 


If in addition the norm of an involutive Banach algebra satisfies the Gelfand 
condition 
|a*a|| = |jaa*|| = |la\|?_ forallae A 


it is called a C*-algebra. 
Often an involutive Banach algebra or a C*-algebra contains a unit 7. Then we 
assume that ||/|| = 1. 


Definition 30.2 Suppose that A, 6 are x-algebras. A map z : A —> B is called 
a homomorphism of «-algebras or a «-homomorphism if, and only if, it respects 
the structure of a *-algebra, i.e., 


mw(aa+ Bb) = ax(a)+Br(b), VWa,be A, Va,B €C; (30.1) 
w(ab) = x(a)r(b), Va,beA; (30.2) 
w(a*) = ana) ,VaeA. (30.3) 


Many results about abstract C*-algebras are obtained by studying properties of their 
representations by operators on a Hilbert space where one defines 


Definition 30.3 Let A be a C*-algebra. A representation (7,H) of A is a *- 
homomorphism z of A into the C*-algebra B(H) of all bounded linear operators on 
a Hilbert space H. 

A representation (7, H.) of A is called cyclic if there exists a cyclic vector, i.e., a 
vector x € H such that the closed linear subspace [zr (A)x] generated by all z(a)x € 
H equals the representation space H: 


[x (A)x] =H. 


30.1.1 Representations of B(H) 


For the C*-algebra of all bounded linear operators on a Hilbert space the general 
form of its representations can be determined. This result will be used later in our 
analysis of completely positive maps. 

By Theorem 22.3.4 we know that the set of all compact operators 6,(H.) on a (sep- 
arable) Hilbert space H is a C*-algebra (without unit, if H is infinite dimensional). 
The following result clarifies the structure of all its representations. 


Theorem 30.1 Every continuous representation (1 ,H,,) of the C*-algebra B.(H) 
is equivalent to the direct sum @,, (1, H,) of the identity representation, At> A, 
A € B.(H), and the zero representation A +> 0, A € B-(H). 


Proof Let {e,} be an orthonormal basis of 7 and denote by P,, the orthogonal 
projector onto the one-dimensional subspace [e,,] = Ce, spanned by e,. By Theorem 


30.1 Representations of C*-Algebras 457 


22.3.2 we know that for A € B.(H) the finite rank operators 


converge in (operator) norm to A. Since the sequence of projectors cy P;,MeN 
is bounded in norm (by 1) it follows that 


M N 
A= lim dX 2 P;AP, in BCH). (30.4) 
j=l n= 


Now calculate for x € H 
P;AP,x = (ej, A€n)(€n, X)e; 
and define operators U;, on H by 
U jnx = (en, X)e;- 
Then we can write 
Pj; AP, =4jnUjn, Gjn = (€;, A€n).- (30.5) 


The operators U ;, are partial isometries from [e, ] to [e;] which satisfy forall j,n,m € 


N 
Unn = Ph, Uj, = Unj; Ujn Unm = Ujm (30.6) 


As finite rank operator all the operators P,, Uj, belong to B.(H). Note also that for 
any fixed n € N the closure of the ranges of all the operators {U jini J EN | is H. 

Now let (sr, H.,,) be a continuous representation of B.(H). For every A € B.(H), 
it follows 


M N M WN 
(A) = lim Sd 6 (Pi) (A) (Pr) = tm Yo ajnt(Ujn). (30.7) 


j=l n=1 j=l n=l 


Therefore, the knowledge of all the 7(Uj,) allows to find the representatives 2 (A) 
for all A € B.(H). Since z is a representation, the relations (30.6) also hold for the 
representing operators z(P,,) and 7 (Uj,). For n € N define 


M, = ran ((Pn)) C Hz. 


Since (P,)a(P;) = 0 forn ¢ j the closed subspaces M, of H, are pairwise or- 
thogonal for different indices. Relations (30.6) for 2(P,,), 7(U jn) imply furthermore 
that the operators 2(Uj,) are partial isometries with initial domain M,, and range 
Mj. Hence, all the subspaces {M,,} have the same dimension. 


458 30 Operator Algebras and Positive Mappings 


If z(P,,) = 0 for all n € N, then by (30.7) one has (A) = 0 for all A € B.(H) 
and z is the zero representation. If 2 is not the zero representation, there isn € N 
such that 2(P,,) ~ 0 and thus M,, ~ {0}. Hence, there is a unit vector f, € M, and 
we can define an orthonormal system { f i} in H, by setting 


fj =U jn) fr, forall j €N. 
This orthonormal system generates a closed linear subspace M, C H,: 
M, = [ {fi} ]. 


We now show that this subspace is invariant under all (A), A € B.(H): Observe 
first that 


1 (Ujm) fe = 7 U jm) Ukn) fn = 0 for m # k 
1 (U jm) fm a T(U jm) Umn) fn = mU jn) fn = fi 


holds. Because of (30.5) this implies 
(Pj )m(A)t (Pn) fk =9 form #¢k (30.8) 
qe( Pj) (A) (Pn) fm = Gjmfj- (30.9) 


Thus, all the operators 2(P;)m(A)a(P,) are reduced by the subspace M,; now 
(30.7) implies that all operators 7(A), A € B.(H), are reduced by this subspace too. 

In the subspace M, the operator 7 (P;) is the projection onto the subspace C f;, 
J EN, hence by (30.9) the matrix 


[a jm] = [(ej, Aem)] = Ufj,0(A) Sn) mz] (30.10) 


is the matrix of (A) with respect to the orthonormal basis { f i} of M,. 
Next, define an isometric mapping V of 1 onto M,, by setting 


Ve; = dis J eN 
and extend it by linearity and continuity to all of H. Relation (30.10) implies 
V*n(A)V =A forall A € B.(H) 


and thus the representation z is unitarily equivalent to the identity representation. 

If M, = H,, we are done. Otherwise look at the orthogonal complement Mt of 
M, in H,,. Certainly, M+ is invariant under all 7(A), A € B.(H). For all x € M, 
and all y € M+ we have 


(x, 1(A)y)H, = (1(A*)x, y)4, =0 


since 1(A*)x € M,,. Thus, the restriction of 2 (A) to M+ defines a representation of 
B-(H) in M+ and we can proceed as above to find an invariant subspace M!_ of M+ 


30.1 Representations of C*-Algebras 459 


on which this representation is unitarily equivalent to the identity representation. 
Now by iteration of this argument we conclude. 


Theorem 30.2 (Naimark) Every representation (11 ,H,,) of the C*-algebra B(H) of 
all bounded operators on a separable Hilbert space H is the direct sum of identity 
representations A t+ A and a representation of the quotient algebra B(H)/B.(H). 

If the representation of this quotient algebra is not the zero representation, then 
it is an isomorphism of B(H)/B-(H) into the algebra of bounded linear operators 
on a Hilbert space. 


Proof Since B(H) is a C*-algebra with unit Theorem 30.3 implies that every rep- 
resentation of it is continuous. Therefore, a representation (77,H,) of B(H) is at 
the same time a continuous representation of 6.(#1) C B(H). Thus Theorem 30.1 
applies. Hence, modulo a unitary map the representation space H., is the direct sum 
@®,, Hn of copies H, = H of H and a space Ho. In each of the spaces H,, the 
representation z is the identity representation of 6.(H) while in Ho it is the zero 
representation. We have to show that for arbitrary but fixed n, the representation z 
of B(H) also reduces to the identity representation in H,. 

With the orthogonal projector Q, : Hz, —> H, introduce 1(A), = Qnm(A)Qn, 
Le., 

T(A)nx = Q,m(A)x forall x € Hy. 


Clearly, (A), is a well defined bounded linear operator on H_,. We show 1(A), = A 
by showing that their matrices coincide, calculated with respect to an ONB {e i} of 
H, = H. For the projection operator P; onto Ce;, we have as earlier for alli, j € N, 


P,;AP; = ajj Uj; 
and thus 
m(P; 7 (A) (P;) = m(P,AP;) = ajj Ui). 
Recall P;,U;;, P) AP; € B-(H) and thus we can calculate as follows: 


(ej), T(A)nej) = (Qnei, W(A)ej) = (e;,T(A)ej) = 
(Pje;, (A) Pje;) = (e;, Pim (A) Pej) = aij (e;, Uijej) = aij 


and we conclude 2(A), = A. 
It follows, for all x € H,, A, B € B(H), 


Qn,1(A)a(B)x =(AB)x, Qnm(A)Qnm(B)x = A(Bx), 


and therefore Q,7(A)(z(B)x) = Qn7(A)Qn(2(B)x). Denote the closed linear hull 
of the set {7(B)x : x € H,, B € B(H)} by H!,. Then the above argument shows that 
in H/, one has 


Onm(A) = Qnm(A)Qn. (30.11) 


460 30 Operator Algebras and Positive Mappings 


Naturally H, C H/, and therefore Q, = 0 on H+ C H,,. As earlier one shows that 
Hi}, and H+ are invariant under all 7(A), A € B(H), hence Q,7(A) = O in H+ 
and (30.11) holds in all of H,. 

Now apply the involution to (30.11) and in the result replace A* by A. This gives 


(A) 0), = Onm(A)QOn and thus 


O,1(A) = T(A)Qn, 


ie., the space H,, reduces all the operators 7(A), A € B(H). In the space H,, the 
identity 7(A), = A now takes the form 


Ax = Q,m(A)x = W(A)OQ nx =W(A)x, x © Hy 


and thus in the space H,, the representation of all of B(#) reduces to the identity 
representation. 


30.2. On Positive Elements and Positive Functionals 


An element a of a *-algebra A is called positive if, and only if, one of the following 
equivalent conditions hold: 


a=b*b forsome be A; (30.12) 


a=c for some cE A, = {ae A:a* =a}. (30.13) 


In the case that A is a C*-algebra of operators on a Hilbert space 1 or a subspace of 
such an algebra one has a third characterization of positive elements a, namely 


(x,ax)>O forall x EH. (30.14) 


In this case the proof of equivalence of these three conditions is straightforward 
by using the square root lemma (Theorem 21.5.1) and the polar decomposition of 
operators (Theorem 21.5.2). For the characterization of positive elements in abstract 
C*-algebras one has to refer to the spectral theory for these algebras (see Theorem 
6.1 of [3]). 

Using the characterization (30.14) of positive elements, it follows easily that the 
set A, of all positive elements in A is a closed convex cone, i.e., if a,b € A, and 
a,B > 0 thenaa+ Bb € A,. This cone satisfies Ay 1 (— Ai) = {0}. Hence 
A, induces an order in the real Banach space A;, of hermitian elements in A. For 
a,b € Ap we write a > b if, and only if, a—b e€ A,. 


Definition 30.4 A linear map f :. A —> C ona C*-algebra JA is called a positive 
functional if its restriction to the cone A of positive elements has only nonnegative 
values, i.e., if f(a) => 0 for alla € Ax. 


30.2 On Positive Elements and Positive Functionals 461 


The following proposition collects the basic facts about positive linear functionals. 


Proposition 30.1 For a positive linear functional f on a C*-algebra A with unit 
one has: 


(a) Foralla,be A 


f(a*b) = f(b*a) (30.15) 
| f(a*b)|? < f(a*a) f (b*b) (30.16) 
(b) f is continuous and || f || = fC). 


Proof For the proof of the first part of (a) take arbitrary a,b € A and apply f to the 
two polarization identities 


3 3 
4a*b=S° it(b+ ifay*(b+ ita), 4ba* = > iM(b+ ifayb+ ia) 


j=0 j=0 


and compare the results. For the proof of the estimate take arbitrary a,b € A and 
arbitrary a, 8 € C and observe 


0 < f((wa + Bb)*(wa + Bb)) 
= ||’ f(a*a) + aBf(a*b) + aB f(b*a) + |B|" f(0*b), 
thus, because of (30.15), the quadratic form on C 
|a|? f(a*a) + 2Re(@PE(a*b)) + |B|°f(b*b) 


is nonnegative, hence its coefficients have to satisfy (30.16). 
For the proof of (b) observe for all a € A and all x € H 


(x,a*ax) = |lax||? < |lall? |x|’, 
thus 
a‘a < |lall* J, (30.17) 


and hence for a positive functional f it follows f(a*a) < |la \|? FU). The Cauchy— 
Schwarz inequality (30.16) implies | f(a)|* < fU*I)f(a*a) and we find for all 
aeA 

If(a@| < fF) llall 


and this proves that f is continuous and that || f || = sup {| f(@)|;a@ € A, lal] < 1} < 
fC). But clearly f(J) < || f|| and therefore || f|| = f(). 


462 30 Operator Algebras and Positive Mappings 


Let us consider some simple examples of positive functionals on a C*-algebra 
A of operators on a Hilbert space H: 
For x € H define a function f, : A > C by 


f(a) = (x,ax) forall aeA. (30.18) 


Clearly, by condition (30.14) this functional is positive when restricted to A, . 


Theorem 30.3. Every representation (1,H) of a C*-algebra A with unit I is 
continuous and for alla € A 
IIx (all < hall. 


Proof Given a representation (77, of A, take x € H and define a functional 
fc: A— Cby 
fc@ = (x, m(a)x) forallae A. 


Since z is a *-homomorphism, f; is a positive functional on A: For all a € A one 
has 
fe(a*a) = ||x(a)x|/? > 0. 


By Proposition 30.1 the norm of this functional is || f; || = f.d) = I|x||7. It follows 
IIz(a)x ||? = fe(a*a) < Il fell a*al] = bell? lal? 


and thus ||7(a)x|| < ||x|| lal], hence ||7(a)|| = sup {||z(@)x|| x € H, |lxll <  s 
Ila. 


30.2.1 The GNS-Construction 


In this section we provide the answer to the question: What is the general form of 
positive functionals on a C*-algebra with unit? 

The answer is well known since many years and is given by the GNS-construction 
(Gelfand—Naimark—Segal) which we explain now. This construction can be done in 
a much more general setting. In the algebraic framework of quantum physics this 
construction allows to recover the Hilbert space of the theory. 


Theorem 30.4 (GNS-Construction for A) Let f be a positive functional on a C*- 
algebra A with unit I with f (I) = 1. Then there is a Hilbert space H , a unit vector 
2 € Hy, and a mapping mp on A with values in the space B(H ¢) of bounded 
linear operators on H ¢ with the following properties: 


1. wp: A— BH,) is linear; 
2. wp(ab) = 1 f(a) f(b) for alla,b € A; 
3. mp(a*) = mp(a)* foralla € A; 


30.2 On Positive Elements and Positive Functionals 463 


such that 
f(a) =(2;,a/(a2;) forallae A (30.19) 


where {-,-) denotes the scalar product of H . 

In addition one has [mw (A)2 7] = Hy, .e., 2¢ is cyclic so that mz is a cyclic 
representation of A. 

The triple (H ¢, 82,7) is unique up to unitary equivalence, i.e., if we also have 
f(a) = (2,n(a)&) for alla € A where the triple (H, 2,1) has the properties 
specified above for (H p, 82 ¢, 7 ¢) then there is a unitary operator U : H¢ — H such 
that 2 = U2; and x(a) = Uny(a)U* for alla € A. 


Proof Construction of H ¢: Define 
Iy={aeA: f(a*a)=0}. 


By Proposition 30.1 the given functional f is continuous on A; since a —> a*a is 
continuous on A, J f is aclosed subset. If a,b € I, then, 


f(at+b)(a+b)) = f(@a) + 2Re(f(a*b)) + f(b*b) = 2Re(f(a*b)) = 0 


since by (30.16) 2Re(f(a*b)) < f(a*a)f(b*b), hencea+b € I. Similarly, fora <¢ A 
and b € I the Cauchy—Schwarz inequality (30.16) implies that 


f((ab)*(ab)) = f(b*a*ab)) < f(b*b)'? f ((a*ab)*(a*ab))'? = 0, 


hence ab € I. Since I is obviously invariant under multiplication with scalars we 
conclude that J is a closed left ideal in A. 


A-ly GIy. (30.20) 


Form the quotient space H' of A with respect to +, i-e., the space of all equivalence 
classes 


lajp =at+ly, aed; (30.21) 
HY = A/Ip = {lalp: ae A}. (30.22) 


On HY define addition and scalar multiplication of equivalence classes through their 
representatives, i.e., 


[aly + [bly =la +b], Alalp =[Aalp, VAECabEA. 


Thus, HY becomes a complex vector space. 
Next one shows that the formula 


(aly, (bl) = f(a*b), Lal. (bly € HY (30.23) 


defines a scalar product on the vector space H¥. Finally, define H ¢ as the completion 
of HY with respect to the norm defined by this scalar product and extend the scalar 


464 30 Operator Algebras and Positive Mappings 


product (30.23) by continuity to +. Thus, H+ is a complex Hilbert space. 
Construction of (2¢ and zr: first define 


Qy =p. (30.24) 


Clearly, 2 € Hy satisfies (Qr,Q-) = fU*T) = fU) = 1. Hence this is a unit 
vector. 
Next define 


m?(a)[b]y = ably, Va,be A. (30.25) 


Because of property (30.20), a is well defined. And it follows easily that m(a) is 
a linear operator on HY for all a € A. In order to prove boundedness of the linear 
operator (a) we estimate as follows. For all b € A one has, using (30.17), 


|zx%(a)[bly|;, = (ably, fab] p) = f(ab)*ab) = f(b*a"ab) 
< f(O* |la\l? 1b) = |lal? f(O*b) = |lall* (bly. [b] p) ¢ 


hence 7r9(A) is bounded and | x%a)| < < Jal). 


Thus, by continuity and density of H°., 2 ¢ extends uniquely to a mapping zr : 
A — B(H,). It is straightforward to see that this mapping is linear. This proves 
property 1. 

Since the product in A is associative, property 2 of 2+ follows easily from its 
definition: 


m (ab)[clp = [(ab)e]¢ = [a(be)]¢ = m¢@lbelp = 
m (a(x r(b)[c] p) = (a) (b))cly, Va,b,c eA. 
In order to establish property 3. we calculate as follows, for arbitrary a,b,c € A: 
(x(a) [bly lel) = (Llp. 7¢@lcl¢) = (Lbl yp, Lacls) = f(b*ac) 
= f((a*b)*c) = ([a*bly, [cl¢) = (wp(@*VIb] p, [cl y), 


hence mz (a*) = my(a)* foralla € A. 
Finally, note that our construction gives for alla € A 


(2p, ap(@2¢) = (Uy, 7p OU 5) = fal) = fo, 


therefore the representation formula (30.19) holds. 


30.3 Normal States 465 


Uniqueness up to unitary equivalence: suppose that we also have f represented as 
f(a) = (&@,7(a)). Define a linear mapping U : Hs —> H by 


Ulalp =a(A2Q, Vaed. 
It follows that U° is linear and satisfies, for alla,b € A 
(U*[a],U° Lb] p) = (1 (a)2, m(b)Q) = (2, n(a*b)Q) = f(a*b) = ([a]p, [b] p) - 


We conclude that U° is an isometry defined on the dense subspace HY with 
dense range 2(.A)§2, hence it extends continuously to a unique unitary operator 
U: Hy — H. Next calculate 

m(a)n(b)2 = (ab) = Ulab), = Uny(a)lbly = 

= Un;(a)U*U[b] ¢ = Unp(ayU*n(b)2 


for all a,b € A. Since 2(A)Q@ is dense in H we conclude m(a) = Um;(a)U* for 
alae A. 


30.3 Normal States 


In the last section we considered positive linear functionals f on a C*-algebra A with 
unit J satisfying f(/) = 1. Such functionals are called states of A. Here under an 
addtional continuity assumption we determine the general form of states in the case 
where A is a weakly closed subalgebra of B(H) (see Sect. 26.3), i-e., if A is a von 
Neumann algebra. Such an algebras has the remarkable property of being equal to 
its bi-commutant (see [3]), i.e., A = A” = (A’) where the commutant A’ is defined 
by 
A’ ={B € BH): BA= AB forall Ac A}. 


By Proposition 30.1 states are continuous for the (operator) norm when A is a C*- 
algebra. Note that states generate the cone of positive linear functionals. 
Simple examples of states are vector states ,, x € H, ||x|| = 1, defined by 


[y(A) = (x, Ax) AEA. (30.26) 


Another class of examples is obtained as follows. In Formula 26.18 choose g, = en 
with {e,} € €?(H) and >, llen||? = 1. Then the operator T = >-, [en,én] is a 
positive trace class operator with Tr(T) =), llen |? = 1 and the formula 


T(A) = Tr(7'A) = yee Aen) (30.27) 


n 


defines a state on B(H). 


466 30 Operator Algebras and Positive Mappings 


It turns out that under the conditions we are considering every state on A will be 
of this form. This result is used quite often in quantum physics and naturally in the 
theory of operator algebras. 


Definition 30.5 A positive linear functional 44 on a von Neumann algebra A C 
B(H) is called 
1. normal if for every bounded increasing net {A;:ie/7} C A, = 
{A €¢ A: A* = A}! one has 
(sup A;) = sup 14(Aj). (30.28) 
i I 


2. completely additive if for every orthogonal family of projections p; in A one 
has 


uD) Pi) = Dai). (30.29) 


iel iel 


Note that a family of projections is called orthogonal if any two different projections 
are orthogonal, i.e., pjp; = 0 fori A j. In this context it is important to be aware 
of the following simple result. 


Proposition 30.2 (Theorem of Vigier) If {A; : i € I} C BCH) is a bounded increas- 
ing net of self-adjoint operators then there is a self-adjoint operator A = sup, Aj 
such that 

A= lim Aj 


in the strong topology on B(H). 


Proof Every x € H defines an increasing net (x, A;x) in R which is bounded by 
C ||x||* if C denotes the bound for the given net (||A;|| < C for alli € J), hence 
the net converges. The polarization identity (Proposition 14.1.2) implies that the net 
(y, Aix) converges for any fixed x, y € H; denote the limit by B(y, x). It follows 
that B(y,x) is a symmetric sesquilinear form on bounded by C ||y|| |x|]. Such 
forms define a unique self-adjoint operator A € B(H) by B(y, x) = (y, Ax) for all 
x,y € H. By construction (y, Ax) = lim;(y, Ajx), hence A = lim; A; for the weak 
topology on 6(H). Furthermore, for every x € H, 


(x, Ax) = lim{x, Aix) = sup(x, A;x), 
I 


hence A = sup, Aj. 
Since the net is bounded it follows that A = lim, A; for the strong topology on 
B(H): for every x € H we can estimate Ax — A;x, using A — A; > 0, 


(A — Apx |? = (A — AD'2(4 — Aj)?2x]° < [CA — Ad? | A — 4D"? 


' a net in A, is a function on some directed set J with values A; € Ay, i € J, see [4]. 


30.3 Normal States 467 


= ||A — Ajll (x, (A — Aj)x) S 2C(x, (A — Aj)x); 


thus weak convergence of the net implies strong convergence. 


Note that in our context this result implies that sup, A; € A so that (30.28) is 
meaningful. The main result of this section is 


Theorem 30.5 (Characterization of Normal States) For a state 4 ona von Neumann 
algebra A © B(H) the following statements are equivalent: 


(a) wis normal; 
(b) «is completely additive; 
(c) pis of the form 


wM(A)=Tr(AW), AEA (30.30) 
with a positive trace class operator W with Tr(W) = 1. 


Proof For the proof we proceed in the order of the implications (a) > (b) > (c) > 
(a). 

(a) = (b): Let {p; : i € J} be any orthogonal family of projections in A. For 
finite parts J of the index set J introduce the projection py = )0;.; pi. Then 
{p, : J CI, J finite} is a monotone increasing net which is bounded by id. Thus 


by Proposition 30.2 
lim py =D) pi 


iel 


Since jz is assumed to be normal it follows 


LCD) pi) = lim 44(p;) = lim Yo (pi) = >) wi), 


iel ies iel 


hence jz is completely additive. 

(b) => (c): This is the core of the proof. The main technical part of the argument 
is formulated in the following Proposition 30.3. This proposition states that a com- 
pletely additive state jz on A is strongly continuous when restricted to the unit ball 
A, of A. Then the second part of Theorem 26.6 implies that jz is of the form 


H(A) = Do (8n, Aen), AEA 


n 


with {gn}, {en} € €°(H) where we used (26.18). As in the introductory example the 
form (30.30) of jz follows. 

(c) = (a): Again according to the second part of Theorem 26.6 jz is o-weakly 
continuous if (c) is assumed. On bounded sets the weak and o-weak topology agree 
according to Lemma 26.2. Thus by Proposition 30.2 we conclude. 


468 30 Operator Algebras and Positive Mappings 


Proposition 30.3 Every completely additive state 4 on a von Neumann algebra 
A © B(H) is strongly continous when restricted to the unit ball A, of A. 


Proof For the proof we have to find suitable seminorms for the strong topology by 
which we can estimate the given state. This could be achieved by finding suitable 
vector states which dominate jz. This idea can be realized first on certain parts of A 
and then on all of A. 

Claim 1: There are a nonzero projection p € A and a vector x € 1 such that 


<x on pAp. 


For the proof of this claim choose x € H, ||x|| = 1. Then we have (J) = (x, Ix) = 
1 = uw). Introduce Po = {p € A: p = projection, x(p) < (p)}. If Po is empty, 
then 4(p) < (4,(p) for all projections p in A and we are done. 

If Po is not empty consider the collection P of all subsets 


P=({p; € Po: p; mutually orthogonal} C Po. 


By set inclusion P is a partially ordered set in which every chain has an upper bound 
(the union of the elements of this chain). Hence by Zorn’s lemma,” P has a maximal 
element P. Then p = >> pep Pi < / and thus, since jz is completely additive, 


x(P) = D> wep) < > wp) = HCD) pi) = Mp) < KU) = a (), 
pieP pieP pieP 


hence p < J and therefore gq = I — p # 0. Since every projection q’ € qAq is 
orthogonal to each p; € P, by maximality of P we know q’ ¢ Po, and thus for all 
projections gq’ € qAq it follows u(q') < ux(q’). 

According to the spectral theorem (Theorem 27.3),> every positive A € qAq 
is the norm limit of linear combinations ya Aj;q; Of projections g; € gAgq with 
positive coefficients 4 ;. The above estimate implies 


n n n n 
uO AGq;) = D> AjMGG;) SD AGG; = Ux A595). 
j=l j=l j=l j=l 
Since jz and jy are continuous with respect to the norm topology we get in the limit 


MA) < ux(A), Ae gAq,A=0 


and Claim 1 follows. 


? This lemma says: Every partially ordered set P in which every linearly ordered subset has an 
upper bound contains at least one maximal element. 


3 Our version of the spectral theorem proves this claim only for the case A = B(H). For the general 
case of A C B(H), we have to refer to Theorem 5.2.2 of [5]. 


30.3 Normal States 469 


Claim 2: There is a family {p;} of mutually orthogonal projections in A and of 
points x; € H such that 


<x, on pjAp; and Spi =I]. (30.31) 


U 


The proof of this claim relies again on Zorn’s lemma. According to the first claim, 
we know that the set 


So = {(p,x) : p projection in A,x € H, wu < wy on pAp} 
is not empty. Then consider the collection S of subsets 
S = {(p;,x;) € So : {p;} mutually orthogonal} . 


S is partially ordered by set inclusion and then every chain in S has an upper bound. 
Zorn’s lemma implies that S has a maximal element S,,. Define p as the sum of 
all projections p; for which (p;,x;) € Sm: p = >); pi- If p < I theng =I —-p 
is a nontrivial projection. Apply the statement of the first claim to gAq C B(qH). 
Hence, there is y € gH and a projection po € gAq such that w < fy on pogAgqpo. 
By construction po is mutually orthogonal to all projections p; with (p;,x;) € Sm. 
This contradicts the maximality of S,, and therefore we get p = )°; p; = I and 
(30.31) follows. 

Claim 3: Given ¢ > 0 there is a neighborhood of zero U for the strong topology 
such that |~(A)| < ¢ forall Ac UN A). 

The proof of this claim follows now easily from (30.31): Since yz is a completely 
additive state we know 


l=ud)= uC>- pi) = > up); 


thus, the index set of our maximal family S,,, is countable and there is a finite subset 
J of the index set of our maximal family S,, such that Oe ey M(pi) < 1- a For 
q =1- vic, pi this gives w(q) < © Define U = {AE A:)>., |Apixill < €/2} 
and observe (A) = (Ag) + “(A >0;.; pi). For all A € A with ||A|| < 1 we 
estimate as follows: 


|u(Aq)| < w(A*A)'/? W(g*q)'? < (gq)? < €/2 


and similarly 


[A D> pil < Do |m(Apal < D2 wa"? w((Ap,)*Ap)"? = D> epi A* Api)! 


ied ied ied ieJ 
<0 bx; (pi A* Api)? = oN Apixill- 
ied ieJ 
Putting these estimates together gives our Claim 3 for the neighborhood U 
introduced above. 


470 30 Operator Algebras and Positive Mappings 


30.4 Completely Positive Maps 


In Sect. 30.2.1, the general form of positive linear maps 
f:A—C=M,(C) 


has been determined for C*-algebras. A natural extension of this problem is to look for 
the general form of positive linear maps with values in the space M;,(C) of complex 
k x k matrices 

F:A—M,(C), k>1, 


or even more general for mappings with values in a C*-algebra of operators on a 
Hilbert space H, 


F:A—B, BcCBH) (30.32) 


extending the representation formula (30.19). This problem was investigated and 
solved by Stinespring in 1955 for the general case of C*-algebras [6]. It was found 
that one can arrive at a representation formula similar to (30.19) if one imposes on 
F a stronger positivity requirement, namely that of complete positivity. 


30.4.1 Positive Elements in M;,(.A) 


Let A be a C*-algebra. For k = 1,2,... introduce the space M;,(A) of k x k matrices 
[a;;] with entries aj; € A,i,j = 1,...,k.Inanatural way this space is a C*-algebra 
(with unit if A has a unit) (see [3]). 
According to our earlier discussion we call an element 
Git ++ Aik 
a=lajJ=| i: : » ajEA 
Ay +** kk 


positive, a > 0 if, and only if, a = c*c for some c € M;(A). 
If A is a C*-algebra of operators on a Hilbert space H then a = [aj;] € Mx(A) 
acts naturally on 


Hi = [e=,....6) 6 eM I=L... eI (30.33) 


according to the rule 


k 
(laijlé)i = Yo aijéj, EE H*. (30.34) 


j=l 


30.4 Completely Positive Maps 471 


The space (30.33) is a Hilbert space with the scalar product 
k 
Ema =) Epnju Vene Ht. (30.35) 


Positive elements in M;,(A) are characterized by the following lemma (see [3]): 
Lemma 30.1 The following conditions are equivalent for an element a = [a;;] € 
(1) a = b*b for some b € M,(A); 

(2) (&,a&)q = Oforallé € H'; 
(3) a= [aij] is a sum of matrices of the form [ajaj],a1,... ,ax € M(A); 
(4) For all x,,... ,x, € A one has 


k 
) xFajjxj > 0 in A. 
ij=l 


Proof (1) = (3): If a = b*b for some b € M,(A), then aj; = (b*b);; = 
yk 1 Be binjs Cm = [b* ;bmj] € Mx(A) is of the claimed form and a = >‘ 


m=1 Cm- 
Thus (3) holds. 
(3) = (4): If we know a = [aja;] for some a; € A and if any elements 
X1,...,X, € A are given, then 
k * 
oe GijXj = ae rai= (Soa “| (dre), 
i,j=1 i,j=1 =1 =1 


k 
now b= )0j_, aix; € A and 
k 
» * __ LF 
X;, AijXji = b b, 
i,j=l 


hence this sum is positive. 
(4) => (2): If (4) holds, then by condition (30.14), for allx € H and all x; € A, 


It follows 


for all 3 € H* which are of the form 
E = (x1X,... XX), xEHxjpeA. 


But this set equals H*, hence (2) holds. 


472 30 Operator Algebras and Positive Mappings 


(2) => (1): If (2) holds, the square root lemma (Theorem 21.5.1) implies that a 
has a positive square root b = ./a € M;(A) such that a = b? = b*b and (1) follows. 

Note that (1) = (2) is trivial and also (3) = (1) is simple. If a is of the form 
[a;a;], then 


a\ ak 
ajay a} dg a; 0 0 
QO -: 0 
az = = b*b, 
* * * 
aja, ++: ata a 0 .::- O 
k@1 k@k k 
QO -. 0 


hence (1). 


Elements in M;,(A) which satisfy any of the 4 equivalent conditions of Lemma 
30.1 are called positive. 


Lemma 30.2 Let A be a C*-algebra of operators in a Hilbert space 'H. Then, given 
any a,... ,a, € Aone has forallae A 


[(aa;)*(aa;)] < lal]? [afaj] in Mx(A). (30.36) 
Proof The matrix of operators [(aa;)*(aa;)] acts on the Hilbert space H* according 
to (30.34) and for all x = (x),... ,xx) € H* we have 
2 2 
(x, [(aaj)*(aaj)]x) aye = Y aajx; < |lall’ Yo ajx; = |lall? (x, [a*aj]x) 740, 
j H j H 
thus (30.36) follows. 


30.4.2. Some Basic Properties of Positive Linear Mappings 


In the proof of the Stinespring factorization theorem for completely positive maps, 
we need some basic properties of positive linear maps. These are briefly discussed 
here. 

Let A and B be x-algebras. Recall: Elements a in A are called positive (more 
accurately, nonnegative), in symbols a > 0, if there is b € A such that a = b*b. 
A corresponding characterization of positive elements applies to 6. Furthermore, a 
linear mapping T : A —~> B is called positive if, and only if, for all a € A with 
a > O one has T(a) > 0 (in B). 

Knowing what positive elements are, we can define an order on A and Bb: For 
a,,a2 € Aone says that a, is smaller or equal to a2, in symbols a, < ap, if, and only 
if, dg — a, > 0. 


30.4 Completely Positive Maps 473 


For any positive linear mapping T : A —> B, the following holds: 
a1,42 € A, ay <a => T(a,) < T(a). (30.37) 
Positive linear maps T : A —> B satisfy the following important estimates: 


Lemma 30.3 Suppose that A is a C*-algebra with unit I. Then any positive linear 
map T : A —> B satisfies 


T (x*a*ax) < |lall? T(x*x) Va,xeA. (30.38) 


In particular 
T(a*a) < lal? TU) VaeA 


and thus T (I) = 0 implies T = 0. 
Proof From (30.17) we know for alla € A 
a‘a < |lal/’ I, (30.39) 


hence for all x € A, 
x*a*ax < |lall? x*x, 


and thus for any positive linear mapping T : A —> B estimate (30.38) follows. 
If we choose x = I we get the estimate for T(a*a). This estimate implies that T 
vanishes on all positive elements of A if T(7) = 0. Now observe that every a € A 


can be written as 1 1 

a= zata y+ i ee ), 
where the elements a, = $(a +a*) anda; = or (a —a*) are self-adjoint (Hermitian), 
i.e., a*, = a,;. From spectral theory it follows that every self-adjoint b € A can be 


eee are 


written as the difference of two positive elements in A, b = by — b_ with bi > 0. 
By linearity of T we conclude that T vanishes on all of A. 


30.4.3 Completely Positive Maps Between C*-Algebras 


Suppose that a linear map F : A —> B between two C*-algebras A, B is given. For 
k =1,2,... it induces a map 


Fy: My(A) — M(B), Fi (Lai) = [F(aiy)I, (30.40) 
for all ajj € A, i,j = 1,2,...,k. 


Definition 30.6 A linear map F : A —> B6 as above is called k-positive if, and 
only if, F;, is positive, i.e., if F, maps positive elements of M;,(A) to positive elements 
of M,(B), If F is k-positive for all k € N then F is called completely positive. 


474 30 Operator Algebras and Positive Mappings 
Remark 30.1 In physics literature, the map F; is usually written as 

Naturally, our characterization of positive elements in M;,(A) of the previous 
subsection implies a characterization of k-positive and completely positive maps. 
Corollary 30.1 Let F : A —> B be as above. Then F is k-positive if, and only if, 


k 
VueA Vy eB > YFFQ}x)yj 20 inB. (30.42) 


ij=l 


Proof By condition (3) of Lemma 30.1 every positive element [a;;] in M;,(A) is a 
sum elements of the form [x}x;], x1,... ,x% € A; hence F is k-positive if, and only 
if, [F(x}'x;)] is positive in M;,(B). According to Condition (4) of Lemma 30.1, this 
is the case if, and only if condition (30.42) holds. Thus we conclude. 


Corollary 30.2. Let F : A —> B be as above with B = M,(C) = C. If F is 
positive, then F is completely positive. 


Proof If F : A —> C is positive, then F(b*b) > 0 for all b € A. Using the 
characterization (30.42) we show that F is k-positive for all k. For y; € C the sum 
in (30.42) can be written as 


k 


k * Uk 
> y; FQ; xy; = F (x vs (> vs) = F(b"b). 
i=1 i=1 


i,j=l 


Thus, by Corollary 30.1, F is k-positive and we conclude. 


A first example of a completely positive map: Any homomorphism F : A —> 6 of 
C*-algebras is completely positive. 

The proof is simple. Using Corollary 30.1 we show that a homomorphism of *- 
algebras is k-positive for every k € N. For all x; € A and all y; € B one has, using 
the properties of a homomorphism of *-algebras and Lemma 30.1, 


k k k * k 
Sf FOtxp)y; = Do yP FQ Foy; = (» Pes) So Fay; 
i=l j=l 


i,j=l ij=l 


which is certainly > 0 in B. Hence F is k-positive for every k and therefore 
completely positive. 

Our next example is just a slight extension of the first. Let 7 : A —> Ba 
*-homomorphism of A and V some element in 6; define F : A —> B by 


F(a)=Vin(ayvV, VaeA. (30.43) 


30.4 Completely Positive Maps 475 


A similar calculation as above shows that F is a completely positive map. For all 
x; € Aandall y; € Bone has, using the properties of a homomorphism of *-algebras 
and Lemma 30.1, 

k 


k 
> FF GPx p)y) = DO VV aay )Vy; = 


i,j=1 i,j=l 
k * k 

(doe > a(x) Vy; >O0mB. 
t=1 j=l 


Note that if V is not unitary then the map F of (30.43) is not a representation of A. 


30.4.4 Stinespring Factorization Theorem for Completely Positive 
Maps 


The Stinespring factorization theorem shows that essentially all completely positive 
maps are of the form (30.43). The proof is a straightforward extension of the proof 
for the GNS-construction. 

We state and prove this result explicitly for the case where A and B are C*-algebras 
of operators on a Hilbert space. The general case is given in [3]. 


Theorem 30.6 (Stinespring Factorization Theorem) Let A be a C*-algebra with 
unit I and B C B(H) be a C*-algebra of operators in a Hilbert space H. Then 
for every completely positive map f : A —> B there exist a Hilbert space Kf, a 
representation m+ of A in Ky, and a bounded linear operator V : 1 —> Ky such 
that 


f@=V'n(aVv VaeA. (30.44) 
Furthermore, for all € € H, \|Véll¢ = | fe |e 


Proof Construction of Ky: On the algebraic tensor product A ® H define, for 
elements ¢ = ae qj ® &,x = Ys bj @nj mn A@H, 


k 1 
Exe, digi. f(arb;)nj)u (30.45) 


One verifies that this formula defines a sesquilinear form on A @ H. In particular, in 
the notation of Sect. 4.1, 


k 
= 2A (&, Flap aj én = (Ef apaplé) qe = (8. felaj aj DE)a 


According to Lemma 30.1 the element a = [aj'a;] € M,(A) is positive and, since 
f is completely positive, f; is a positive mapping from M;(A) into M;(B), hence 


476 30 Operator Algebras and Positive Mappings 


fi(laja;]) is a positive matrix on H* and we conclude (¢, ¢) ¢ = 0. Therefore the 
sesquilinear form (30.45) is positive semidefinite and hence it satisfies the Cauchy— 
Schwarz inequality 


[ver meter cera 
We conclude that the kernel 


k 

f= Yo a @ & EABH: (6,6) -0| 

i=l 

of this sesquilinear form is a linear subspace of A ® H. On the quotient space 
KY =A@H/Iy = {([]p =F + 1p: 6 € AGH} 


the formula 


(1 ¢. Ux] ¢) = (6.x) ¢ 


then defines an inner product and thus the completion K ¢ of KY with respect to the 
norm defined by this inner product is a Hilbert space. 
Construction of z¢: For [€]¢ € Ko,¢= ae a, ® & € A®@ H define 


k 
me(algly = [So aa; @éEly Vac. (30.46) 


i=] 


At first we calculate 


(xp(aygl,, reals] y) = aw ont ae @ Ely = (Sam 08, 


k 


k 
Y\ aa; ® Ej); =p (E:, f ((aa;)*(aaj))Ej). = (E, fe(Uaai)*(aaj NE) qh 


j=l 
where € = (&,... ,&) € H*. Lemma 30.2 says 
[ara*aaj] < ljal|* (a7aj]. 
Since f is completely positive, the map f; is positive and therefore 
fi(laja*aa;}) < |la\l? fe(Laza;)). 

We conclude 

(E, fi((aaz)*(aajyE) aye < llall? (E, fella?asDE) qu 
and hence 


(xt(alel, re @le]r)¢ < lal? (eI. [SI r) 5 - (30.47) 


30.4 Completely Positive Maps 477 


This estimate shows first that m?(a) is well defined (i.e., (a) is indeed a map 
between equivalence classes and does not depend on the representatives of the 
equivalence classes which are used in its definition). 

Now, using (30.46), it is a straightforward calculation to show that my Ko > 


K° is linear. Then (30.47) implies that this map is bounded and |=%a)| < |jall. 


From the definition (30.46) it is immediate that mY satisfies 
m(ab) = ry(a)ry(b) Va,beA. 


In order to show 
¢(a*) = (a) VaeA 


we calculate as follows: for ¢, x as in (30.45) anda e€ A, using (30.46), 


k 1 
(relay (e]¢. Ux] ) = Ue p.77@UxIp) = >> >i, Fahabj nya = 


i=l j= 


an 


k 1 
(Ei, F(a" ai" bj )nj)n = (L_ @*ai) ® Sil, > bj @ nyly) 


k 
= 1 i=1 j=l 


1 


L 


lj 
= (wpa IE) pp) - 

Since this identity is true for all [¢]/,[x]¢ € Ky we conclude. 

This establishes that a is a representation of A on K¥. 

By continuity my has a unique extension to a representation z ¢ on the completion 
Ky of Ki. 
Construction of V: Define V° :  —> KY by 

WE=[1@Elp VEEH. (30.48) 

An easy calculation shows that V° is linear. Now calculate 

(V°E,V°E) = (1 BET @E)S = (EF DEH <IFOIE. En VE CH. 


This shows that V° is bounded. Since f(/*/) = fC) is positive we know f(/) = 
G/F and thus 


Ive] s|VFmE], €%. (30.49) 


In the case of f(/) = J, V° is thus an isometry. 
Now fora € Aandé € H the identity 


my(a)V°E = [a BE], 
follows. We deduce 


mye(A)V°H = KY. (30.50) 


478 30 Operator Algebras and Positive Mappings 


V° is extended by continuity to a bounded linear operator V : H —> K f and 
then the last condition reads 


[mp(A)VH] = Ky, (30.51) 


where [- - -] denotes the closure of the linear hull of --- in Ky. 
Now all preparations for the proof of the Stinespring factorization formula have 
been done. For 7, & € H one finds for all a € A, 


(V°n, wp(a)V°E) = ([1 @ nlp. [a @ Elp) = (1 @n,a BE) ¢ = (n, FUE) y 


and therefore f(a) = (V°)*x y(a)V° for all a € A. By continuous extension the 
Stinespring factorization formula (30.44) follows. 


Corollary 30.3 (Uniqueness Under Minimality Condition) Let f : A —> B be 
a completely positive map as in Theorem 30.6 and let 


f(ia)=U*n(@U, aeA (30.52) 


be a Stinespring factorization of f with a representation x of A in a Hilbert space 
K and a bounded linear operator U : H —> K. If this factorization satisfies the 
minimality condition 


[x(AJUH] =K (30.53) 


then, up to a unitary transformation, it is the factorization constructed in Theorem 


30.6. 


Proof We begin by defining a linear operator Wo : 1 p(A)VH —> 2(A)UH by the 
formula 


k k 
Wo i ncsove =DU"G)UE eA, & EH. (30.54) 
i=l i=l 
Now calculate the inner product of these images in K: 


k 1 k 1 
(Wo Do mpi)VEi, Wo Da rOV ne = (D> rQaNUE, Y aQ)Uni)K = 


i=l j=l i=l j=l 


k 1 
o, (UE, w(xj)*r(y Une = D> YE, U* aay Una = 


i=1 j=1 i=1 j=1 

k ol 
<r (Gi, FOFys jn = D> Ei. Vimar iV nj) 
i=1 j=1 i=1 j=1 


k l 
= (Yo mp(ai)V Ei, Yt p(y:)V Nj). 


i=l j=l 


30.4 Completely Positive Maps 479 


We conclude that Wo : 7;(A)VH —> 2(A)UH is isometric and thus extends by 
continuity to a unitary operator 


W : [x¢(A)VH] — [x(AUH], 


i.e., because of the minimality condition to a unitary operator W : K ¢ —> K. From 
the above definition the relations 


Wr;(-)W*=x2(-), WV=U 


follow immediately. 


Corollary 30.4 Let f : A —> B be a completely positive map as above. Then the 
inequality 


flay’ f(a) < IFC f(@a) (30.55) 
holds for alla € A. 


Proof By Theorem 30.6 f has a Stinespring factorization f(a) = V*s(a)V and 
thus 


flay f(a) = (V*n(aVvy Vir(ayV = Vir(ayVVin(aV 
< VI? Vix(ay*x(@V = IIVIP f(a*a). 


The estimate |VEl|- < || fU)'é||,, for € € H implies |V|| < || fU)!?|| or 
IVI? < | "7 |)? = FOI 


30.4.5 Completely Positive Mappings on B(H) 


Theorem 30.2 determines the structure of representations of the C*-algebra B(H). If 
we combine this result with Stinespring’s factorization theorem we arrive at a more 
concrete form of completely positive mappings on B(H). 

Suppose that operators aj,...,dm € B(H) are given. Define f,, : B(H) — 
B(H) by 


fin(@) = ajaa ee 
j=l 
Using Corollary 30.1, one shows easily that this mapping is completely positive. 
Next, suppose that we are given a sequence {a af C BCH) of operators for which 
there is a positive operator B such that for all m € N 


Sem) tay SB. (30.56) 


480 30 Operator Algebras and Positive Mappings 


Thus, we get a sequence of completely positive mappings f,, on B(H) which 
converges. 


Lemma 30.4 (Completely Positive Maps on B(H)) For a sequence of operators 
a; € B(H) which satisfies (30.56) the series 


f(a) = azaa;, a € BCH) fixed (30.57) 
j=l 


converges in the ultraweak operator topology on B(H) and defines a completely 
positive mapping which satisfies 


f(D < B. (30.58) 


Proof Recall that every a € B(H) has a representation as a complex linear combina- 
tion of four positive elements. Thus, it suffices to show this convergence for positive 
a € BCH). In this case we know that 0 < a < |la|| J and it follows for arbitrary 
x €HandmeN 


m m 


2, la'a; ix” => x, a" ja jx a Xx,a- GajXx < |la|| (x, Bx), 
j=l 


hence this monotone increasing sequence is bounded from above and thus it 
converges: 


CO 
Yi (x, ajaajx) < llall (x, Bx). 


The polarization identity (14.5) implies that for arbitrary x, y € H the numerical 


series 
oe) 
> (x,ajaajy) 


converges. This shows that the series (30.57) converges in the weak operator topology 
and thus defines f(a) € B(H). Since the partial sums of the series considered are 
bounded, we also have ultraweak convergence by Lemma 26.2 (on bounded sets both 
topologies coincide). 

Finally, we show that f is completely positive by showing that it is k- 
positive for every k e€ N. According to Corollary 30.1, choose arbitrary 
A,...,Ag, Bi... , By € BCH). For every x € H we find 


Sas 
Il 


k k 
(x, D> bf f(Afa;)B;x) = lim (x, > BF fn(Afaj)Bjx) = 0 


ij=l ij=l 


since fi is k-positive. We conclude that ae ja 8 bs f(Afaj;)B; = 0 in BCH) and 
hence f is k-positive. 


30.4 Completely Positive Maps 481 


Combining Stinespring’s factorization theorem with Theorem 30.2 shows that 
essentially all completely positive mappings on B(H) are of the form (30.57). 


Theorem 30.7 Every completely positive mapping f : B(H) —> B(H) is of the 
form 


f@ = Vimo(a)Vo+ oajaaj, a € BCH) (30.59) 
jet 
with the following specifications: 
ty is a representation of the quotient algebraB(H)/B-(H) on a Hilbert space Ho 


(hence mo(b) = 0 for all b € B,(H)), Vo is a bounded linear operator H —> Ho, J 
is a finite or countable index set, and a; € B(H) satisfy 


Yo ata; < f(D). (30.60) 


jeJ 


Proof Theorem 30.6 implies that a given completely positive mapping on B(H) is 
of the form (30.44) with a representation 7 of B(H) on a Hilbert space K and a 
bounded linear operator V : H —> K, ice., 


fla=Vin(aV, aeé BH). 


The general form of representations of B(H) has been determined in Theorem 
30.2. According to this result 2 has the direct sum decomposition (J some finite 
or countable index set) 

T=mO@ Ba Tj 


jes 
where for j € J mz; is the identity representation z;(a@) = a in the Hilbert space 
Hj; =H and where zp is a representation of the quotient algebra B(H)/B.(H). This 


means that there is a unitary operator U from the representation space K of z onto 
the direct sum of the Hilbert spaces of these representations 


Ho ® PH;. 
jes 


Denote the projectors of UK onto Ho by Po, respectively onto H; by P;, j € J. 
Thus, f(a) = V*z(a)V takes the form 


f(a) = Vémo(a)Vo + )\asaaj, a € BUH) 
jes 


where Vp = PoUV anda; = P;UV for j € J. Since mo(/) = 0 one has the bound 
(30.60). 


482 30 Operator Algebras and Positive Mappings 


30.5 Exercises 


1. Denote the transposition on M>(C) by T, i.e., T(A) = A’. Show that T is positive 
and preserves unity. Consider the trivial extension J @ T : M,(C) ® M,(C) — 
M>(C) ® M2(C) given by A ® B > A ® B’. Show that this extension is not 
positive. 

Hints: Choose special simple examples for A, B. 

2. In M2(C) consider the linear map T defined by its action on the basis elements 
{ho = (0x, 0y,0,)} of M2(C): Th) = bh, T(o) = do. For which A is T 
positive, respectively completely positive? 

3. Show that the space B.(H) of compact operator is a *-algebra which is not weakly 
closed, i.e., B.(H) is not a von Neumann algebra. 

Hints: Show 6.(H)” = B(H) and recall the “algebraic” characterization of a von 
Neumann algebra. 

4. Let T be a positive map of the Abelian *-algebra A into the Abelian *-algebra 
B. Show that T is completely positive. Reflect on the meaning of this result for 
quantum physics. 

References 

1. Buchholz D, Haag R. The quest for understanding in relativistic quantum physics. J Math Phys. 
2000;41:3674-3697. 

2. Jost R. The general theory of quantized fields. Providence: American Math Soc.; 1965. 

3. Takesaki M. Theory of operator algebras I. Encyclopedia of mathematical sciences—operator 
algebras and non-commutative geometry. Vol. 124. Berlin: Springer; 2002. 

4. Kelly JL. General topology. New York: Springer; 1991. 

5. Ringrose JR, Kadison RV. Fundamentals of the theory of operator algebras: elementary theory. 
Vol. I. Academic Press; 1983. 

6. Stinespring WF. Positive functions on C*-algebras. Proc Am Math Soc. 1955;6(2):211-216. 


Chapter 31 
Positive Mappings in Quantum Physics 


31.1 Gleason’s Theorem 


In Theorem 26.4 we learned that the continuous linear functionals on the space 
of all compact operators on a separable Hilbert space H are given by trace class 
operators according to the Trace Formula (26.10). There is a profound related result 
due to A. Gleason which roughly says that this trace formula holds when we start 
with a countably additive probability measure on the projections of H instead of a 
continuous linear functional on the compact operators on H (Recall that all finite 
dimensional projections belong to the space of compact operators). 

Gleason’s result [1] is very important for the (mathematical) foundation of quan- 
tum mechanics. A historical perspective and some key ideas related to this work are 
presented in [2]. 


Theorem 31.1 (Gleason’s Theorem) Let jz be a countable additive probability 
measure on the projections of a separable Hilbert space H of dimension greater 
than 2. Then there is a unique nonnegative trace class operator W of trace I such 
that for every projection P on H one has 


u(P) = Tr(W P). (31.1) 


The original proof by Gleason relies on methods not related to topics presented in 
this book. Moreover this proof is quite long. Accordingly, we do not present it here. 
Instead, we discuss a weakened version due to P. Busch [3] which however is valid 
in any Hilbert space, contrary to Gleason’s result. 

A proof of Gleason’s original result which is more easily accessible is given in 
[4, 7, 8]. 

In [5] the physical meaning of effects and their mathematical realization is ex- 
plained. Intuitively the term effect refers to the “effect” of a physical object on a 
measuring device. Projections are idempotent effects, i.e., they satisfy E? = E, and 
are interpreted as “properties.” Effects which are not properties can be interpreted 
as “unsharp properties” (see [6]). Clearly, on any Hilbert space there are many more 
effects than properties. 


© Springer International Publishing Switzerland 2015 483 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_31 


484 31 Positive Mappings in Quantum Physics 


Denote by €(H) the set of all effects on the separable Hilbert space H, i.e., the 
set of all A € B(H) which satisfy 0 < A < J. 


Definition 31.1 A generalized probability measure on all effects on a seperable 
Hilbert space H is a function x : E(H) —> R which satisfies 


1) 0 < wW(E) < 1 forall E € EH), 
2) wd) = 1, 
3) for any sequence (£;) C E(H) such that pa E; < I one has 


uC) Ej) =D) ME). 
j j 


Similar to Gleason’s result one would like to know the general form of generalized 
probability measures on effects. It turns out that the analysis in this case is much 
simpler, mainly due to the fact that now a more or less standard extension in the 
ordered vector space of self-adjoint elements in B(H) 


BH) = BCH), — BH) 


which is generated by the positive elements, is possible. 


Lemma 31.1 Any generalized probability measure ts on E(H) is the restriction of 
a positive linear functional f : BCH) —> C to the set of effects E(H): uw = flecuy. 


Proof Because of the defining conditions 1) and 3), a generalized probability mea- 
sure jz on effects is monotone, i.e., if E, F € E(H) satisfy E < F then u(E) < w(F). 
If E € E(H) andn é€ N are given, condition 3) implies uw(E) = nu(+E), since 
E= LE +eee+ LE (n summands). Next suppose m,n € N are given andm <n 
so that m/n < 1, thus "7 E ¢ €(H) and the relation w+ E) = 1 u(E) implies (--- 
means m summands) 


m 1 1 1 m 
—ME)=mu(—-F)= wW-E+-:-+—-F)=u—E), 
n n n n n 


and therefore, (gq E) = qu(E) for all rational numbers g € [0, 1]. 

If0 <r < 1 is any real number, there are sequences of rational numbers q;, pj € 
(0, 1) such that p; | r and q; t r and forall j ¢ N,O < qj <r < p; < 1. Then 
we know q;E < rE < p,E and therefore by monotonicity of wu 


qj M(E) = W@jE) < WE) < w(pjE) = pje£). 


In the limit 7 > o, we thus get ru(E) < w(rE) < ru(E£). This implies u(r) = 
rE) for all E € E(H) and all r € [0, 1]. 

Next, suppose that A € B(H) is given satisfying 0 < A but not A < J. Then 
there is r > 1 such that E = tA € €(H). But clearly r and E are not unique. If 
we have A = r, E, = r. Ey we can assume that r7 > r; > 1 so that 0 < = <1. It 
follows (Ez) = wu) = 7 H(E1) or r} 4(E}) = r2(E2). This allows to define 


31.1 Gleason’s Theorem 485 


by : BH), — R by (A) = rpu(E) whenever A = rE with E € E€(H) and 
r>1. 

Clearly, (4; is positive homogeneous on the convex cone B(H), of nonnegative 
bounded linear operators on H. 

In order to show that jz; is additive on B(H), take A, B € B(H).. Then there is 
r >1such(A+ B)/r € E(H). The definition of jz; gives w;(A + B) = ru(h(A + 
B)) = rw A) + ru B) = m(A) + 11(B). 

Altogether we have shown that jz; is an additive positive homogeneous function 
on the convex cone 6(H);. Thus, according to a standard procedure in the theory of 
ordered vector spaces, the functional jz; can be extended to a linear functional 2 on 
B;(H). If C = A— B € BCH), — B(H)x define w2(C) = (A) — 1) (B). It is easy 
to see that 12 is well-defined. If C is also represented as A’ — B’ € B(H), — B(H),, 
then it follows 1(A) — w1(B) = 1(A’) — 1 (B’), since A'+ B = A+ B’ implies 
My(A’) + 1 (B) = wi(A + B) = wi(A + B’) = p1(A) + “1(B’). In order to 
show that 2 : B,(H) —> R is additive, take C = A — B and C’ = A’ — B’ in 
BCH), — BCH), and calculate 2(C + C’) = w2(A — B+ A! — B’) = (A+ A’ — 
(B + B’)) = wy (A + A) — wy (B + BY) = 1 (A) + M1 (A) — 1 (B) — Wi (B") = 
M2(A — B) + 11 (A’ — BY) = p2(C) + wa(C’). 

Clearly, since j2; is positive homogeneous, so is 42. Next, suppose A < OandC = 
A-—B € B(H);—B(H), are given. ThenaC = (—A)B—(—A)A € BCH), -B(H) + 
and so 2(AC) = pi((— A)B) — wi((— AA) = (— A) (B) — (— A) (A) = 
A(H1 (A) — 11(B)) = Ap2(C). 

It follows that uw. : B,(H) — R is a positive linear functional which agrees 
with « on E(H). Since B(H) = B,(H) + i B,(H) the real linear functional 2 
is extended to a complex linear functional f : B(H) —> C by setting for A = 
a+ibe B,H)+ 1B HY), f(A) = u2(a) + 1 f(b). A simple calculation shows 
that f is indeed complex linear on B(H) and by construction w = f lec). 

Note that in the proof of this lemma, condition 2) has not been used and condition 
3) has been used only for finitely many effects. 


Theorem 31.2 (Busch) Any generalized probability measure 1 on the set of effects 
E(H) of a separable Hilbert space H is of the form 


MCE) =Tr(WE) forall E € E€(H) 


for some density operator W. 


Proof According to the extension Lemma 31.1, any generalized probability measure 
ju on the set of effects is the restriction to this set of a positive linear functional 
f on B(H). Such functionals are continuous according to Proposition 30.1. Now, 
since projections are (special) effects, condition 3) says that the functional f is 
completely additive (see (30.29)). Hence, we conclude by Theorem 30.5. 


Remark 31.1 Technically, in a two-dimensional Hilbert space the condition of ad- 
ditivity of y on sets of pairwise orthogonal projection is not strong enough to enforce 


486 31 Positive Mappings in Quantum Physics 


linearity of w. More concretely, let 42 be a normalized additive measure on the pro- 
jections E of C?, i.e., w(E) + (E+) = 1. According to Eq. 18.23 every projection 
is of the form EF = $(12 +e-o) with a unique unit vector e € R?. Introduce the 
function f(e) = MG (2 +e-a)). Deduce that f satisfies f(e) + f(—e) = 1 for all 
unit vectors e € R?. This condition is not strong enough to force f to be linear. 


31.2. Kraus Form of Quantum Operations 


A quantum mechanical systems undergoes various types of transformations, for 
instance symmetry transformations, time evolution and transient interactions with 
an environment for measurement purposes. These transformations are described by 
the concept of a quantum operation and the nature of these mappings has been 
discussed since about 50 years starting with a paper by Sudarshan et al. in 1961. 

A mathematically rigorous and comprehensive study of quantum operations has 
been published by K. Kraus in 1983 in [5]. Starting from first (physical) principles 
it is argued that quantum operations are given mathematically by linear mappings 


: BH) — BH) 


of the space of trace class operators on a (separable) Hilbert space H into itself which 
are completely positive and satisfy 


Tr(@(W)) < 1 (31.2) 


for all W € B,(H) with W > 0 and Tr(W) = 1, i-e., for all density matrices on H. 
In Definition 30.6 we had defined completely positive mappings as mappings 
between C*-algebras which satisfy certain positivity conditions. Clearly 6,(H) is 
not a C*-algebra with unit (if H is not finite dimensional) but it is a two-sided ideal 
in the C*-algebra B(H). Therefore, these positivity conditions can be formulated 
for 6\(H) in the same way as for the C*-algebra B(H). And it is in this sense that 
we understand complete positivity fora map @ : B\(H) —> B,(H), ie., @ is 
completely positive if, and only if, it is k-positive for k = 1,2,.... However, in the 
characterization of positive elements in 5,(#) there is an important difference to 
the characterization of positive elements in a C*-algebra. According to the spectral 
representation of trace class operators (Theorem 26.3), T € B,(H) is positive if, and 
only if, T = t*t forsome t € B2(H) (not in B,(H)). The characterization of positive 
elements in M;,(B\(H)) for k > 2 is addressed explicitly later (see Lemma 31.4). 


31.2 Kraus Form of Quantum Operations 487 


31.2.1 Operations and Effects 


Lemma 31.2 [fa positive linear map ¢ : B\(H) — B\(H) satisfies (31.2), then 
it is continuous with respect to the trace norm: 


lM sCilTh, C=supTr@W)) <1 (31.3) 


where the sup is taken over all density matrices W on H. 


Proof By (31.2) we obviously have that C = sup Tr(@(W)) < 1 where the sup is 
taken over all density matrices. Write T = T* € B\(H) as T = T, — T_ where 
Ts = (|T|4T)/2. We can assume that Tr(7i) > 0. Then W = a Tx are density 
matrices and it follows that 


T = Tr(T,.)W,. — Tr(T_)W. 


and thus (7) = Tr(T,)¢(W) — Tr(T_)¢@(W_). According to (30.16), the trace 
norm of @(T) can be calculated as 


lol = Te [Tr(BO(T))|. 


Insert the above expression for #(T) and estimate as follows: 


|Tr(Tr(T, )BO(W,)) — Tr(T_)Bo(W_))| < 
Tr(T)|Tr(BO(W,))| + Te(T_)|Tr(Be(W_))| 


Since ¢ is positive, we know 


[Tr(Be(W+))| < Bl] loCWa hi = BI Tr@W-)) < II BILC 


and thus 
lah < Tr )C + TrT_)c =C||T|l,. 


The adjoint ¢* of an operation ¢ in the duality between trace class operators and 
bounded linear operators on H (see Theorem 26.5) is then a linear map 


o* : BAL) — BH) 


which is positive too (see Lemma 31.4). From a physical point of view this adjoint 
is important since it gives the “effect” F = Fy corresponding to an operation as 


F=¢"*(D. 


Lemma 31.3 Let ¢ : B\(H) — B,(H) be a positive linear mapping such that 
Tr(@(W)) < | for all density matrices W on H. Then its dual map * (in the duality 


488 31 Positive Mappings in Quantum Physics 


established in Theorem 26.5) is a linear map B(H) —> B(H) which is well defined 
by 


Tr(¢*(B)T) = Tr(B¢(T)) forall B € B(H),T € B,(H). (31.4) 


Proof Given such a map 4, it is continuous according to Lemma 31.2: ||@(T)||,_ < 
C ||T ||, for all T € B\(H). Fix B € B(H); since 


ITr(BE(T))| < BINOMIh < IBICITI. 


T—Tr(B¢(7)) is a continuous linear functional on 6,(H) and therefore according 
to Theorem 26.5 of the form Tr(CT) with a unique C € B(H). This element C is 
called @*(B). This applies to every B € B(H) and thus defines the adjoint mapping 
g*, and by construction Relation (31.4) holds. A straight forward calculation 
establishes linearity of ¢*, using uniqueness in the duality between 5,(H) and 


BH). 


Corollary 31.1 For a positive linear mapping $ : B\(H) —> B,(H) the following 
Statements are equivalent: 


a) Tr(@(W)) < 1 for all density matrices W on H; 
b) ¢ is continuous and $*(1) < I. 


Proof Suppose a) holds. Then, by Lemma 31.2 the map ¢ is continuous. Thus, 
according to Lemma 31.3 the dual mapping ¢* : B(#1) —> B(H) is well defined 
and Relation (31.4) holds, in particular for all density matrice W and B = I, 


Tr(o*(1)W) = Tr(d(W)). 


It follows Tr(@*(1)W) < 1 for all W. For W = [x,x], x € H, ||x|| = 1 this implies 
Tr(@*(D Lx, x]) = (x, d*()x) < 1 and hence (x, @*(/)x) < (x, x) forall x € H and 
o*(1) < T follows. 

Conversely assume b). Since ¢ is continuous, the dual map ¢* is well defined 
and (31.4) holds and thus again Tr(@*(1)W) = Tr(@(W)) for all density matrices W. 
Now $*(J) < I implies a) 


Tr(@(W)) = Tr(W'76* (DW!) < Tr(W!?w!/?) = Tr(W) = 1. 


Lemma 31.4 A linear mapping ¢ : B\(Ht) — B\(H) is completely positive if, 
and only if, its adjoint mapping @* : BH) —> B(H) is completely positive. 


Proof Naturally the proof consists in showing that for all k € N, the mapping ¢ is 
k-positive if, and only if, the adjoint mapping ¢* is k-positive. We do this explicitly 
for the case k = | and indicate the necessary changes for k > 2. 


31.2 Kraus Form of Quantum Operations 489 


If B € BCH) is given, define a linear functional Fz on B,(H) by F_(T) = 
Tr(B¢(T)). According to Theorem 26.5, the duality is given by the trace formula 


Tr(Bo(T)) = Tr(@*(B)T) for all B € BCH), T € By(H). (31.5) 


If ¢ is positive, then we know $(T) > 0 for all T € B\(H), T > O, and we have 
to show that ¢*(B) > 0 for all B € BCH), B = 0. According to Theorem 26.3, 
o(T) € B,(H) is positive if, and only if, it is of the form ¢(7) = t*t for some 
Tt = t* € Bo(H). In this case we have 


Tr(B@(T)) = Tr(Br*t) = Tr(tBr*) > 0 forall B> 0. 
The duality relation implies 
Tr(¢*(B)T) => 0 forall T € B\(H), T = 0. 


Now choose x € H and insert the positive finite rank operator T = [x, x] defined by 
[x,x]y = x(x, y), y € H, into this estimate to get 


0 < Tr(p*(B)T) = (x, 6*(B)x) 


and thus ¢*(B) > 0 for B > 0. 

Conversely, assume that ¢* is a positive mapping so that ¢*(B) > 0 for all 
B > 0. Then, by Lemma 30.1 (or the square root lemma) for some b € B(H) we 
know $*(B) = b*b and the duality relation yields 


Tr(BO(T)) = Tr(d*(B)T) = Tr(b*bT) = Tr(bTb*) > 0 


for all T > 0. As above insert B = [x, x] to get (x, 6(T)x) => 0 whenever T > 0 
and hence the mapping ¢ is positive. 

Now assume k > 2; abbreviate A = 6,(H) and B = B(H). We have to show 
that d, : M,(A) —> M,;,(A) is positive if, and only if, @f : M,(B) — M,(B) 
is positive. Recall that A = [a;;] € M,(A) respectively B = [bij] € M(B) act 
on the Hilbert space Hk = H x H x --- x H (k components). Under standard 
matrix operations we have M,(B) = B(H*) and similarly My(A) = B,(H*) (see 
the Exercises). For the relation of traces in H and in H* one finds (see again the 
Exercises for this chapter) 


k 
Try (IT) =) TH(Tjs) 


j=l 
when Tr denotes the trace in H. Thus, we get the extended duality formula 
Tryp (dij Ge Tij)) = Tepe (Of (bis DIT) (31.6) 
since 
k k 
Try (dislbe(Ti)) = D> Try G(T) = D> TO" is)T ji). 
ij=l ij=l 


Therefore, we can argue for ¢, : B\(H*) —> B,(H*) and gy: BCH) — B(H*) 
as above for @ and ¢*. 


490 31 Positive Mappings in Quantum Physics 


31.2.2. The Representation Theorem for Operations 


Naturally the question about the general mathematical form of a quantum operation 
arises. The answer has been given in [5]. In Sect. 30.4.5, we had studied completely 
positive maps on B(H). Here, we begin by investigating completely positive maps 
on trace class operators and find some extensions of the earlier results. 


Proposition 31.1 For a sequence of operators a; € B(H) which satisfies (30.56) 
with bound B the series 


g(T) = YajPat, T € B,(H) fixed (31.7) 
j=l 


converges in trace norm and defines a completely positive mapping on B\(H). The 
related series (30.57), i.é., 


fa@= YS ataaj, a € B(H) fixed 


j=l 


converges ultraweakly and defines the adjoint of 9, i.e., 


¢*(a) =) > a¥aaj, a € BUH) fixed. (31.8) 
j=l 


Furthermore, 
¢*U) < B. 


Proof Given 0 < T € 6\(H) define for m € N, 


m 


om(T) =) ajTa’. 


j=l 


Clearly ¢,,(T) is nonnegative and of trace class; thus, for m > n we find 


llom(L) — bn(TIh, = | D> ajTas]) = Tr] > ajTas 


j=ntl 1 j=nt+l 


=Tr{ >> afajT | =Tr(S,T) — Tr(S,T) 
j=nt+l 
where the operators S,, = Vi= 14; “a; where introduced in (30.6). Because of the 
ultraweak convergence S,, —> S according to Lemma 30.4 we know Tr(S,, 7) —> 
Tr(S7) and we conclude that (¢,,(7)) is a Cauchy sequences with respect to the trace 
norm and therefore this sequence converges in trace norm to a unique @(T) € B,(H). 


31.2 Kraus Form of Quantum Operations 491 


Since the trace norm dominates the operator norm, we also have convergence of 
(31.7) in operator norm and thus also ultraweakly. Since every trace class operator is 
the complex linear combination of four positive ones, the series (31.7) converges for 
every T € 6)(H) with respect to the topologies as indicated above. The complete 
positivity of @ follows as in the proof of Lemma 30.4. 

These continuity properties allow to determine the dual ¢* of ¢ easily. This dual 
is determined by 


Tr(B¢(T)) = Tr(¢*(B)T) for all B € BCH), T € By(H). 


We have 
Tr(BO(T)) = lim Tr(BE,(T)) 
m—> Ooo 
and by property c) of Corollary 26.3 


Tr(B os ajTa;)= Tl x a’ Ba;T). 
j=l j=l 
According to Lemma 30.4, we know limy,_, 4, pace 14; “Ba; = f(b) in the ultraweak 
topology, hence Tr(@*(B)T) = Tr(f(B)T) for all T - B,(H). Thus we conclude. 


Theorem 31.3 (First Representation Theorem of Kraus) Given an operation ¢ : 
B,\(H) — B,(H), there exists a finite or countable family {a; iy € J} of bounded 
linear operators on H, satisfying 


Yo aia; <1 forall finite Jy C J, (31.9) 


JEIo 


such that for every T € B,(H) and every B € B(H) one has 


oT) =) ¢ajTas (31.10) 
jes 
respectively 
o*(B) = a; Ba;. (31.11) 
jes 


The effect F corresponding to ¢ thus has the representation 
P= =) aya). (31.12) 
jes 


In the case that the index set J is infinite, i.e, J = N, the series in (31.10) con- 
verges with respect to the trace norm while the series (31.1 1-31.12) converge in the 
ultraweak operator topology. 


492 31 Positive Mappings in Quantum Physics 


Conversely, if a countable family {a; iy e€ J} of bounded linear operators on H 
is given which satisfies (31.9) then Eq. (31.10) defines an operation ¢ whose adjoint 
o* is given by (31.11) and the effect F corresponding to this operation is (31.12). 


Proof Suppose we are given a completely positive map @ : B\(H) —> B6,(H) 
satisfying (31.2). Lemma 31.4 implies that the adjoint map ¢* : B(H) —> B(H) is 
completely positive too, thus according to Theorem 30.7 ¢* is of the form (30.59) 


$*(B) = Vi'm0(B)Vo + )\a*Ba;, B € BX) 
jes 


with bounded linear operators a; € B(H) satisfying 


yo ata; < $*() 


jes 


and where the representation zg vanishes for all B € B,(H). According to Corollary 
31.1, the bound ¢*(/) < I is known and hence condition (31.9) holds. 

Proposition 31.1 implies that the map ¢7(B) = Duser a; Ba; on B(H) is the 
adjoint of the mapping ¢\(T) = Dies ajTa; on B,(H). In order to conclude, 
we need to determine the map ¢ on B)(H) whose adjoint is the map $5(B) = 


V)0(B)Vo on B(H). This map is defined through the duality relation 
Tr(@p(B)T) = Tr(Bdo(T)) forall Be BCH),T € BH). 


Since the representation zo of B(H) vanishes on the subspace B.(H), we know 
Tr(Bd¢o(T)) = 0 for all B € B.(H), hence in particular for all x, y € H setting 
B= [xy], 

(y, Po(T )x) = Tr(Lxy]oo(T)) = 9, 


and therefore 9(T) = 0 for all T € B(H) and thus ¢5 = 0 on BCH). It follows 
od) = Yo aiaj. 
jes 


The converse has already been proven in Proposition 31.1 and Lemma 30.4 with the 
bound B = I when we observe Corollary 31.1. 


Remark 31.2 Sometimes one requires that an operation ¢ is trace preserving, 
ie., Tr(@(W)) = 1 for all density matrices W. This will be the case when in our 
representation (31.10) the operators a; satisfy 


yaad, (31.13) 


jes 


31.3 Choi’s Results for Finite Dimensional Completely Positive Maps 493 


In order to prove this recall that according to (31.12) one has 


Yi ajaj = 6") 


jes 
and that we know ¢*(/) < J. The duality relation says 
Tr(d(W)) = Tr(o*(1)W) 


for all density matrices W. Thus, if @*(7) = J then Tr(@(W)) = 1 for all density 
matrices and ¢ is trace preserving. Conversely, suppose that the operation ¢ is trace 
preserving but ¢*(/) ~ I. Then, since @*() < I is known, there is x € H, |x| = 1 
such that (x,@*(/)x) < 1. If the density matrix W = [x, x] is inserted into the 
duality relation one gets 


Tr((Lx, x) = Tr(P* (Lx, x) = (x, 6") x) < 1, 


hence, a contradiction and therefore ¢*(/) = J holds. 


31.3. Choi’s Results for Finite Dimensional Completely Positive 
Maps 


Naturally in the case of mappings f : A —> BwithA = M,(C)andb = M,,(C) we 
can use additional structural information to strengthen the statements of Stinespring’s 
factorization theorem (Theorem 30.6) and to simplify the proofs. This has been done 
in 1975 by M. Choi [9] with inspiration from electrical circuit theory (n-port systems) 
by using the simple fact that these matrix algebras M,,(C) have a basis 


e™), 7,7 =1,2,...,n (31.14) 


where e“*) denotes the n x n matrix with the entry 1 in the ith row and jth column 
and all other entries are 0. 
In terms of this basis we can write a € M,,(C) as follows: 


Qt o+t*) Ain h 


a=|: : : fs age, ai EC. (31.15) 
ij=l 
Qni sts Ann 
And this allows to determine the general form of a linear map f : M,(C) —> M,,(C) 
easily. For a € M,,(C) as above one finds by linearity 


n 


fay= Yi ai fe). 


i,j=l 


494 31 Positive Mappings in Quantum Physics 
Since f(e”) € M,,(C) it has a unique expansion with respect to the basis 


eM) k 1 =1,2,...,m, 


m 
f (er) = » f(ee@)ge™, fle) eC. 


kJ=1 


Thus, we can say that there is a one-to-one correspondence between linear maps 
f : M,(C) —> M,,(C) and system of complex numbers FJ, i,j = 1,....”, 
k,l =1,...,m such that 


m n 


fg yo ane (31.16) 
kl=1i,j=1 
with a as in (31.15). 


Theorem 31.4 (Choi’s Characterization of Completely Positive Maps) For a 
linear map f : M,(C) —> M,,(C) the following statements are equivalent: 


(a) f isn-positive, i.e., the map fy > Mn(M,(C)) —> M,,(M),(C)) defined in (30.40) 
is positive; 
(b) the matrix Cr € M,(M,,(C)) defined by 


FEO). en FCM) 
Cr = : ass : (31.17) 
fem") ne fier) 
is positive where e*)) is specified in (31.14); it is called the Choi -matrix of f. 
(c) f has the form 


nm 


f@)=)° V,aVji,a € M,(C) (31.18) 


pw=1 
with m X n matrices V,,, and thus f is completely positive. 
Proof If alinear map f is of the form (31.18), it is a straightforward calculation to 
show that f is completely positive, just as in the case of the Stinespring factorization. 
Thus it is clear that (c) implies (a). 
(a) => (b): Note that the matrix E € M,,(M,,(C)) given by 


e@lD coheg etn) 


E=] : 0. : (31.19) 


erin) re enn) 


31.3 Choi’s Results for Finite Dimensional Completely Positive Maps 495 


satisfies E* = E (since (e*/))* = e@J9 and E? = E, thus E = E*E is positive 
in M,,(M,(C)) by Lemma 30.1. Since f is assumed to be n-positive, f,(E) = Cf is 
positive in M,,(M,,(C)). 

(b) = (c): By definition, the matrix Cy acts on H?, = C””. If (b) is assumed 
this matrix is positive and thus its spectrum is contained in [0, || C¢ | ]. Its spectral 


representation is of the form 


nm 


Cr= Ou, 05% < (CFI 


v=! 


where Q, is the projector onto the eigenspace corresponding to the eigenvalue Ax. 

Denote by P; the projection from H?, = Hm x Hm X ++: X Hm (n times) to the 
ith component H,,, i.e., P)(Z1,..- 5Zi,-+- .2n) = &, forall z) € Hn, j = 1,...,n. 
Then (31.17) shows 


fem?) = PiCfP;, i,f=l,....n, 


and the spectral representation thus implies 


nm 
fem?) = > Ae Pi Ox P;« 
k=1 


n 
m 


thus has a decomposition v“ = ww, ... Vv) with a € H, fori = 1,...,n. 
With the standard convention for the tensor product the projector Q; can be realized 
as O, = v @ (v)*. This allows to rewrite the above formula for f(e*””) as 


The normalized eigenvector v for the eigenvalue 4, belongs to the space 1”, and 


nm nm 
SET k k 
FO = Pr OO, =) an Say. 
k=1 k=1 


Denote by e), i = 1,... ,n the standard basis of H,. For k = 1,... ,mn define 
linear operators V : H, —> H,, by their action on this basis 


VOHeEO) = hey a ey a 


Hence, we can continue our chain of identities for f(e“*“”) by 


fer?) = > (Vee) ® (VHD) — ~~ VO(e@)) @ (e@D)*)\(yO)* 
k=1 k=1 


or, since e) @ (e@D)* = ei, 


fle) = » VOeaiD (yO) (31.20) 
k=1 


496 31 Positive Mappings in Quantum Physics 


and thus by (31.15) f has the form (31.18). This proves (c). 


Remark 31.3 


(a) This result of M. D. Choi is quite remarkable. It shows in particular that a linear 
map on the matrix algebra M,,(C) with values in M,,(C) is already completely 
positive when it is n-positive. 

(b) In addition, it is shown that such a linear map is n-positive whenever it is n- 
positive on the elements of the (standard) basis. 

(c) Itdetermines the explicit form of completely positive maps which is considerable 
more specific than the Stinespring factorization. 

(d) In the case of matrix algebras the proof of the Stinespring factorization indicates 
that a linear map is completely positive if is is n* x m-positive (the dimension 
of the space M,(C) @ C” is nm). 

(e) The map f — > Cy defined in Eq. (31.17) is often called Jamiolkowski 
isomorphism or Choi-Jamiolkowski isomorphism. It appeared first in [10]. 


Corollary 31.2 (Finite-Dimensional Representations of M,,(C)) Let a 
M,(C) — M,,(C) be a finite-dimensional representation of the matrix algebra 
M,,(C). Then there are m x n matrices V", w= 1,... ,mn satisfying 


VV” = Sa vln 


such that 


nm 


m(a)= YVa(v)* Vae M,(C). 
k=1 


31.4 Open Quantum Systems, Reduced Dynamics and 
Decoherence 


As an illustration of the importance of some of the mathematical results which we 
have presented thus far (mainly tensor product of Hilbert spaces and of operators, 
density matrices, partial trace, normal states, von Neumann algebras) we discuss in 
this section briefly open systems and decoherence. For further details and proofs we 
refer to the literature [11-13]. 

A (quantum or classical) physical system is described by a von Neumann alge- 
bra M of operators acting on some separable Hilbert space H and a subsystem is 
described by a certain subalgebra ’ C M. A physical system is practically never 
in full isolation; some interaction with the environment takes place. In the theory of 
open quantum systems this is modelled by forming a joint system consisting of the 
system we are interested in and a second system modeling the environment. If the 
first system is modelled by the von Neumann algebra MM, and the second by the von 


31.4 Open Quantum Systems, Reduced Dynamics and Decoherence 497 


Neumann algebra M>, the joined system is modeled by the von Neumann algebra 
M = M, ® M2 which acts on the Hilbert space H = H, @ Hz in a natural way. 

If the joined system is in a state given by the density matrix p on H = H; ® H2 
then the state ; of the first system as viewed by an observer who observes only the 
first system is given by the partial trace of o with respect to the second system (see 
Theorem 26.7): 


p, = Tr2(¢). 


The joined system is considered to be closed and often in applications its Hamilton 
operator H is of the form 


H=H,@1I+1® H+ gHin 


with free Hamiltonians H; of system j, j = 1,2, and an interaction Hamiltonian 
Hin, describing the coupling between the two systems and a coupling constant g. 
The time evolution of the state of the total system then is 


p(t) = eit pet, 


From this the time evolution of first subsystem is obtained by taking the partial trace 


pr(t) = Tr2(e(t)) = T(Tr2(e)). 


T;, is called the reduced dynamics. One can show that in general it is irreversible. 
In the Exercises to Chap. 26 it is shown that the partial trace maps density matrices 
on H; @ H2 to density matrices on 1. In addition it is known that the reduced 
dynamics {7;,t > 0} is a family of normal completely positive and unital maps (.e., 
T,(1) = 1) if the initial state is a product state, i.e., 0 = 0; ® p2 with density matrices 
p; on H;, j = 1,2. 

Let M, be the algebra of observables of the system to be investigated and M. 
the algebra of observables of the environment E describing the rest of the physical 
world which one intends to ignore eventually. In the algebraic framework of quantum 
physics a notion of decoherence was first introduced in [11]. Decoherence is a well 
known concept in quantum theory. It does not involve any new physical laws or 
assumptions beyond the established framework of quantum theory. On the contrary, 
it is a consequence of the universal applications of quantum concepts. 

We say that decoherence takes place if there exists a splitting 


M, =A, 8A, 


such that A; isa von Neumann algebra on which the reduced dynamics acts reversibly, 
i.e., is an automorphism group, and a complementing subspace with the property that 
the expectations values of all observables in Az become small as the time gets large: 


lim $(T,(a)) =0 forall a € Ap. 
too 


Any observable a = a* € My, can thus be written as a = a, + a2 witha; € Aj, 
where az becomes unobservable if we wait long enough. After a sufficiently long 


498 31 Positive Mappings in Quantum Physics 


time the system is effectively described by the algebra A, and behaves like a closed 
system, but it may have properties different from the original one. By analyzing the 
structure of the algebra of effective observables A; and the dynamics a, = T;|_4, we 
can classify different scenarios of decoherence: Environment induced superselection 
rules, pointer states, classical systems, new quantum behavior, ergodicity and the 
role decoherence plays in the interpretation of measurements in quantum theory. 

Environment induced decoherence is an asymptotic property of time evolution 
in suppressing interference between possible events and their complements, thus 
rendering them “for all practical purposes” mutually exclusive [14—16]. 


31.5 Exercises 


1. Fork = 2,3,... anda separable Hilbert space H denote H* = H x H x---xH 
(k components). With the standard operations and the natural scalar product H* 
is a Hilbert space in which the given Hilbert space is embedded by isometric 
mappings J; : 7 —> H x {0} x--- x {O}, Jo: H —> {0} x H x {0} x --- x {O}, 
...5 et H — {0} x --- x {0} x H. Show: If B, = fer: je N is an 
orthonormal basis of H, then J,(B,) x Jo(B2) x --- x J; (B,) is an orthonormal 
basis of H*. 

2. Using the notation introduced in the text show M,(B(H)) = B(H*) and 
M,(B\(H)) = Bi(H*). 

Hints: In order to show M;(B,(H)) € B,(H*), use a suitable characterization of 
trace class operators as given in Proposition 26.1 and observe Exercise 1. 
3. Observe Exercise 1 to prove the “trace formula” 


k 
Tre(Til) = )> Te(Tj;) 


j=l 


for [Tjj] € Mx(Bi(H)). 

4. Gleason’s theorem does not hold in C*: According to Remark 31.1, additive 
measures /2 on orthogonal projection of the Hilbert space C? are in a one-to-one 
correspondence with functions f on the unit vectors e € R? with values in [0, 1] 
satisfying f(e) + f(— e) = 1. In this Exercise we construct a function f such 
that the corresponding measure jz is not given by a trace. 

Fix a unit vector e; € R? and define fe, as follows: For a unit vector n define 


O if n-e, <0 
fen) = 


1 if n-ey >O0 


References 499 


Clearly, f., satisfies f.,(n) + fe,(—n) = 1. Next take two unit vector e and e3 
in the plane perpendicular to e; such that e2 - e; = 0 and define 


0 if néet,n-e <0 
fey(n) = 


1 if neet,n-e>0 
Finally, define on unit vectors n € R? 


fe(n) ifn-e; 40 
Seo (n) if n-e; =0, n-e2 40 


ifn=e 


fa) = 
0 ifn = —e; 


Now verify that f is well defined on all unit vectors in R’, satisfies f(n) + f(— 
n) = 1, but is not of the form as stated in Gleason’s theorem for the case of Hilbert 
spaces of dimension > 3. 


References 


13. 


Gleason AM. Measures on the closed subspaces of a Hilbert space. J Math Mech. 1957;6:885— 
893. 

Chernoff PR. Andy Gleason and quantum mechanics. Notices AMS. 2009;56(10):1253-1259. 
Busch P. Quantum states and generalized observables: a simple proof of Gleason’s theorem. 
Phys Rev Lett. 2003;91(12/120403): 1-4. 

Richman F, Bridges D. A constructive proof of Gleason’s theorem. J Funct Anal. 1999;162:287— 
312. 

Kraus K. States, effects, and operations. Lecture notes in physics. Vol. 190. Berlin: Springer- 
Verlag; 1983. 

Busch P. Unsharp reality and joint measurements for spin observable. Phys Rev D. 
1986;33:2253-2261. 

Cook C, Keane M, Moran W. An elementary proof of Gleason’s theorem. Math Proc Camb 
Philos Soc. 1985;98:117-128. 

Hughes R. The structure and interpretation of quantum mechanics. Harvard University Press; 
1989. 

Choi MD. Completely positive linear maps on complex matrices. Linear Algebra Appl. 
1975;10:285-290. 

Jamiolkowski A. Linear transformations which preserve trace and positive semidefiniteness of 
operators. Rep Math Phys. 1972;3:275. 


. Blanchard P, Olkiewicz R. Decoherence induced transition from quantum to classical dynamics. 


Rev Math Phys. 2003;15:217-243. 

Blanchard P, Olkiewicz R. Decoherence as irreversible dynamical process in open qsystemssys- 
tems. In: Joye A, Attal S, Pillet CA, editors. Open quantum systems III recent developments. 
Lecture notes in mathematics. Vol. 1882. Berlin: Springer-Verlag; 2006. pp. 117-159. 
Blanchard P, Hellmich M. Decoherence in infinite quantum systems. In: Briining E, Konrad 
T, Petruccione F, editors. Quantum Africa 2010: theoretical and experimental foundations of 
recent quantum technology. Vol. 1469. AIP Conference Proceedings; 2012. pp. 2-15. 


500 31 Positive Mappings in Quantum Physics 


14. Frohlich J, Schubnel B. Do we understand quantum mechanics—finally? In: Reiter WL, Yng- 
vason J, editors. Erwin Schrédinger—S0 years after. ESI lectures in mathematics and physics. 
Vol. 9. Zuerich, European Mathematical Society. American Mathematical Society; 2013. 

15. Omnés R. The interpretation of quantum mechanics. Princeton, NJ Princeton University Press; 
1994. 

16. Schlosshauer M. Decoherence and the quantum-to-classical transition. 2nd ed. Berlin, 
Springer; 2007. 


Part Il 
Variational Methods 


Chapter 32 
Introduction 


The first two parts of this book were devoted to generalized functions and Hilbert 
spaces whose operators are primarily of importance for quantum mechanics and 
quantum field theory. These two physical theories were born and developed in the 
twentieth century. In sharp contrast to this are the variational methods which have a 
much longer history. In 1744, L. Euler published a first textbook on what soon after 
was called the calculus of variations, with the title “A method for finding curves en- 
joying certain maximum or minimum properties.” In terms of the calculus which had 
recently been invented by Leibniz and Newton, optimal curves were determined by 
Euler. Depending on the case which is under investigation optimal means “maximal” 
or “minimal.” Though not under the same name, the calculus of variations is actually 
older and closely related to the invention and development of differential calculus, 
since already in 1684 Leibniz’ first publication on differential calculus appeared un- 
der the title Nova methodus pro maximis et minimis itemque tangentibus. This can be 
considered as the beginning of a mathematical theory which intends to solve prob- 
lems of “optimization” through methods of analysis and functional analysis. Later 
in the twentieth century methods of topology were also used for this. Here “optimal” 
can mean a lot of different things, for instance: shortest distance between two points 
in space, optimal shapes or forms (of buildings, of plane wings, of natural objects), 
largest area enclosed by a fence of given length, minimal losses (of a company in 
difficult circumstances), and maximal profits (as a general objective of a company). 
And in this wider sense of “finding optimal solutions” as part of human nature or as 
part of human belief that in nature an optimal solution exists and is realized there, the 
calculus of variations goes back more than 2000 years to ancient Greece. In short, 
the calculus of variations has a long and fascinating history. However, “variational 
methods” are not a mathematical theory of the past, related to classical physics, but 
an active area of modern mathematical research as the numerous publications in this 
field show, with many practical or potential applications in science, engineering, and 
economics. Clearly this means for us that in this short third part we will be able to 
present only the basic aspects of one direction of the modern developments in the 
calculus of variations, namely those with close links to the previous parts, mainly to 
Hilbert space methods. 


© Springer International Publishing Switzerland 2015 503 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_32 


504 32 Introduction 


32.1 Roads to Calculus of Variations 


According to legend Queen Dido, fleeing from Tyre, a Phoenician city ruled by 
King Pygmalion, her tyrannical brother, and arriving at the site that was called later 
Carthage, sought to purchase land from the natives. They asserted that they were 
willing to sell only as much ground as she could surround with a bull’s hide. Dido 
accepted the deal and cut a bull’s hide into very narrow strips which she pieced 
together to form the longest possible strip. She reasoned that the maximal area 
should be obtained by shaping the strip into the circumference of a circle. A complete 
mathematical proof of Dido’s claim as the best possible choice was not achieved until 
the nineteenth century. Today one still speaks of the general problem of Dido as an 
isoperimetric problem but where this adjective has the much wider interpretation as 
referring to any problem in which an extremum is to be determined subject to one 
or more constraints, for instance the problem of finding the form which will give the 
greatest volume within a fixed surface area. 

Heron of Alexandria postulated a minimum principle for optics and deduced the 
law of reflection of light for a straight mirror. In 1662 Fermat generalized Heron’s 
principle by postulating a principle of least time for the propagation of light. Later 
several other principles of optimality (minima or maxima) were formulated about 
fundamental physical quantities such as energy, action, entropy, separation in the 
space-time of special relativity. In other fields of science one knows such principles 
too. In probability and statistics we have “least square” and “maximum likelihood” 
laws. Minimax principles are fundamental in game theory, statistical decision theory, 
and mathematical economics. 

In short, the calculus of variations can be described as the generalization of the 
method to solve problems of minima and maxima by elementary calculus, a gener- 
alization to the case of infinitely many variables, i.e., to infinite dimensional spaces. 
In 1744, Euler explained and extended the maxi—minimal notions of Newton, of 
Bernoulli and Maupertuis. His 1753 “Dissertatio de principio minimae actionis” as- 
sociates him with Lagrange as one of the inventors of the calculus of variations, in its 
analytic form. In 1696, Jean Bernoulli posed the problem of determining the path of 
fastest descent of a point mass, i.e., the brachistochrone problem. This problem was 
typical for the problems considered at that time since it required to find an unknown 
function y = f(x) which minimizes or maximizes an integral of the form 


b 
S(f) = / Lx, f(x), f'(@) dx 


Such an integral is a function on a function space or a functional, a name introduced 
by J. Hadamard and widely used nowadays. 

Another famous problem whose solution has been a paradigm in the calculus of 
variations now for about a century is the so-called Dirichlet problem. In this problem 
one is asked to find a differentiable function f whose derivatives are square integrable 
over a domain 2 C R? and which has prescribed values on the boundary 092, ie., 


32.2 Classical Approach Versus Direct Methods 505 


fiae = g where g is some given function on 02, so that the “Dirichlet integral” 


I(f) = [ IDF COI dx 


is minimal. (Such a problem arises, for instance, in electrostatics for the electric 
potential f.) The existence of a solution of the Dirichlet problem was first taken for 
granted, since the integrand is nonnegative. It was only Weierstrass, around 1870, 
who pointed out that there are variational problems without a solution, i.e., in modern 
language for which there is no minimum though the functional has a finite infimum. 

Under natural technical assumptions, the existence of a minimizing function f 
of the Dirichlet integral was proven by D. Hilbert in 1899. The decisive discoveries 
which allowed Hilbert to prove this result were the notion of the “weak topology” 
on spaces which today are called Hilbert spaces and precompactness with respect to 
this weak topology of bounded sets (compare the introduction to Part B). 

For readers who are interested in a more extensive exposition of the fascinating 
history of the calculus of variations we recommend the books [1, 2] for a start. 
An impressive account of the great diversity of variational methods is given in an 
informal way in the recent book [3]. 


32.2 Classical Approach Versus Direct Methods 


Historically, the calculus of variations started with one-dimensional problems. In 
these cases one tries to find an extremal point (minimum or maximum) of functionals 
of the form 


b 
fWw= i F(t, u(t), u'(t)) dt (32.1) 
over all functions u € M = {v € C?({a, b], R”) : ula) = uo, u(b) = ui} where 
ug, u, € R” are given points and m € N. The integrand F : [a,b] x R” x R” > R 
is typically assumed to be of class C? in all variables. A familiar example is the action 
functional of Lagrangian mechanics. In this case the integrand F is just the Lagrange 
function L which for a particle of mass m moving in the force field of a potential V 
is L(t,u,u') = Zu’ — V(u). 

There is a counterpart in dimensions d > 1. Let 2 C R¢ be an open nonempty 
set and F : 2 x R™ x Mma — Ra function of class C? where Ming is the space 
of all m x d matrices; for u : 2 — R” denote by Du(x) the m x d matrix of first 
derivatives of u. Then, under suitable integrability assumptions a functional f(u) is 
well defined by the integral 


fWM= / F(x, u(x), Du(x)) dx. (32.2) 
2 


Such functionals are usually studied under some restrictions on u on the boundary 
082 of Q, for instance the so-called Dirichlet boundary condition ujjgq = uo where 
ug is some given function 02 > R”. 


506 32 Introduction 


In elementary calculus we find extremal points of a function f(x), x € R, of 
class C? by determining first the points x; at which the derivative of f vanishes, and 
then deciding whether a point x; gives a local minimum or a local maximum or a 
stationary point of f according to value f(x;) of the second derivative. 

The classical approach for functionals of the form (32.1) follows in principle 
the same strategy, though the concepts of differentiation are more involved since 
differentiation with respect to variables in an infinite dimensional function space is 
required. The necessary definitions and basic results about this differential calculus 
in Banach spaces is developed in the next chapter. Thus, in a first step we have to find 
the zeros of the first derivative f’, i.e., solutions of the Euler-Lagrange equation 


f'(w =0. (32.3) 


For functionals of the form (32.1), this equation is equivalent to a second order 
ordinary differential equation for the unknown function u (see for instance [4] or 
[2]). If the second derivative f Ay) is positive (in a sense which has to be defined), 
then the functional f has a local minimum at the function wu. If this applies to the 
functional — f, then f has a local maximum at u. 

If only the problem of existence of an extremal point is considered there is another 
strategy available. In order to understand it, it is important to recall Weierstrai’ theo- 
rem and its proof: A lower (upper) semicontinuous function f has a finite minimum 
(a finite maximum) on a closed and bounded interval [a, b]. Here it is essential that 
closed and bounded sets in R are compact, i.e., infinite sequences in [a,b] have a 
convergent subsequence. 

This strategy too has a very successful counterpart for functionals of the form 
(32.1) or (32.2). It is called the direct method of the calculus of variations. We give 
a brief description of its basic steps. 


1. Suppose M is a subset of the domain of the functional f and we want to find a 
minimum of f on M. 

2. Through assumptions on f and/or M, assure that f has a finite infimum on M, 
1.€., 


inf fu) = I(f,M) = I > —oo. (32.4) 


Then, there is a minimizing sequence (Uj)nen C M, 1.e., a sequence in M such 
that 


lim f(u,) = TI. (32.5) 
noo 

3. Suppose that we can find one minimizing sequence (Uy, )n,en C M such that 
u= lim u, € M, (32.6) 


n->oo 


f(u) s liminf f(un); (32.7) 


32.2 Classical Approach Versus Direct Methods 507 


then the minimization problem is solved since then we have 
I < f@W <liminf f(u,) = /, 
n—-oo 


where the first inequality holds because of u € M and where the second identity 
holds because (u,)nen iS a Minimizing sequence. Obviously, for Eq. (32.6) a 
topology has to be specified on M. 

4. Certainly, it is practically impossible to find one minimizing sequence with the 
two properties given above. Thus, in explicit implementations of this strategy 
one works under conditions where the two properties hold for all convergent 
sequences, with respect to a suitable topology. If one looks at the proof of Weier- 
strass’ theorem one expects to get a convergent minimizing sequence by taking 
a suitable subsequence of a given minimizing sequence. Recall: The coarser the 
topology is, the easier it is for a sequence to have a convergent subsequence and 
to have a limit point, i.e., to have Eq. (32.6). On the other hand, the stronger the 
topology is the easier it is to satisfy inequality (32.7) which is a condition of lower 
semicontinuity. 

5. The paradigmatic solution of this problem in infinite dimensional spaces is due 
to Hilbert who suggested using the weak topology, the main reason being that 
in a Hilbert space bounded sets are relatively sequentially compact for the weak 
topology while for the norm topology there are not too many compact sets of 
interest. Thus, suppose that M is a weakly closed subset of a reflexive Banach 
space and that minimizing sequences are bounded (with respect to the norm). 
Then there is a weakly convergent subsequence whose weak limit belongs to M. 
Thus in order to conclude one verifies that inequality (32.7) holds for all weakly 
convergent sequences, i.e., that f is lower semicontinuous for the weak topology. 


In the following chapter the concepts and results which have been used above will be 
explained and some concrete existence results for extremal points will be formulated 
where the above strategy is implemented. 

Suppose that with the direct methods of the calculus of variations we managed to 
show the existence of a local minimum of the functional f and that this functional 
is differentiable (in the sense of the classical methods). Then, if the local mini- 
mum occurs at an interior point uo of the domain of f, the Euler-Lagrange equation 
f’(uo) = O holds and thus we have found a solution of this equation. If the functional 
f has the form (32.1) (or 32.2), then the equation f’(up) = 0 is a nonlinear ordinary 
(partial) differential equation and thus the direct methods become a powerful tool for 
solving nonlinear ordinary and partial differential equations. Some modern imple- 
mentations of this strategy with many new results on nonlinear (partial) differential 
equations is described in good detail in the following books [2, 4—7], in a variety of 
directions. 

Note that a functional f can have other critical points than local extrema. These 
other critical points are typically not obtained by the direct methods as described 
above. However, there are other, often topological methods of global analysis by 
which the existence of these other critical points can be established. We mention 


508 32 Introduction 


the minimax methods, index theory, and mountain pass lemmas. These methods are 
developed and applied in [2, 7, 8]. But we cannot present them here. 


32.3. The Objectives of the Following Chapters 


The overall strategy of Part III has been explained in the Introduction. The next 
chapter on direct methods is the abstract core of this part of the book. There we 
present some general existence results for extrema of functionals which one can call 
generalized Weierstraf theorems. Since the realization of all the hypotheses in these 
results is not obvious, some concrete ways of implementing them are discussed in 
some detail. 

The following chapter introduces differential calculus on Banach spaces and 
proves those results which are needed for the “classical approach” of the variational 
methods. 

On the basis of the differential calculus on Banach spaces, the third chapter 
formulates in great generality the Lagrange multiplier method and proves in a fairly 
general setting the existence of such a multiplier. When applied to linear or nonlinear 
partial differential operators the existence of a Lagrange multiplier is equivalent to the 
existence of an eigenvalue. Thus, this chapter is of particular importance for spectral 
theory of linear and nonlinear partial differential operators. In the fourth chapter 
we continue this topic and determine explicitly the spectrum of some linear second 
order partial differential operators. In particular the spectral theorem for compact 
self-adjoint operators is proven. 

The final chapter presents the mathematical basis of the Hohenberg—Kohn density 
functional theory, which is the starting point of various concrete methods used mainly 
in chemistry. It is based on the theory of Schrédinger operators for N-particle systems 
which was introduced and discussed in Part II for N = 1. 


References 


1. Goldstine HH. A history of the calculus of variations from the 17th through the 19th century. 
Studies in the history of mathematics and physical sciences. Vol. 5. New York: Springer; 1980. 

2. Blanchard Ph, Briining E. Variational methods in mathematical physics. A unified approach. 
Texts and monographs in physics. Berlin: Springer; 1992. 

3. Hildebrandt S, Tromba A. The parsimonious universe. Shape and form in the natural world. 
Berlin: Springer; 1996. 

4. Jost J, Li-Jost X. Calculus of variations. Cambridge Studies in advanced mathematics. Vol. 64. 
Cambridge: Cambridge University Press; 1998. 

5. Dacorogna B. Weak continuity and weak lower semicontinuity of non-linear functionals. Lecture 
notes in mathematics. Vol. 922. Berlin: Springer; 1982. 

6. Dacorogna B. Direct methods in the calculus of variations. Applied mathematical sciences. 
Vol. 78. Berlin: Springer; 1989. 


509 


References 


7. Struwe M. Variational methods: applications to nonlinear partial differential equations and 
Hamiltonian systems. Ergebnisse der Mathematik und ihrer Grenzgebiete, Folge 3. Vol. 34, 
3rd ed. Berlin: Springer; 2000. 

8. Zeidler E. Variational methods and optimization. Nonlinear functional analysis and its 
applications. Vol. 3. New York: Springer; 1985. 


Chapter 33 
Direct Methods in the Calculus of Variations 


33.1 General Existence Results 


From Chap. |, we know that semicontinuity plays a fundamental role in direct meth- 
ods in the calculus of variations. Accordingly, we recall the definition and the basic 
characterization of lower semicontinuity. Upper semicontinuity of a function f is 
just lower semicontinuity of — f. 


Definition 33.1 Let M be a Hausdorff space. A function f : M — RU {+00} is 
called lower semicontinuous at a point x) € M if, and only if, xo is an interior 
point of the set {x e M: f(x) > f(xo) — €} for every ¢ > 0. f is called lower 
semicontinuous on / if, and only if, f is lower semicontinuous at every point 
xo € M. 


Lemma 33.1 Let M be a Hausdorff space and f : M > RU {+00} a function on 
M. 


a) If f is lower semicontinuous at x9 € M, then for every sequence (X,/)(C)M 
converging to xo, one has 


fo) < lim inf f(y). (33.1) 


b) IfM satisfies the first axiom of countability, i.e., if every point of M has a countable 
neighborhood basis, then the converse of (a) holds. 


Proof For the simple proof we refer to the Exercises. 
In Chap. 1, we also learned that compactness plays a fundamental role too, more 
precisely, the direct methods use sequential compactness in a decisive way. 


Definition 33.2 Let M be a Hausdorff space. A subset K C M is called sequen- 
tially compact if, and only if, every infinite sequence in K has a subsequence which 
converges in K. 


© Springer International Publishing Switzerland 2015 S11 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_33 


512 33 Direct Methods in the Calculus of Variations 


The following fundamental results proves the existence of a minimum. Replacing 
f by —f it can easily be translated into a result on the existence of a maximum. 


Theorem 33.1 (Existence of a Minimizer) Let f : M — RU {+00} be a lower 
semicontinuous function on the Hausdorff space M. Suppose that there is a real 
number r such that 


a) [f <r]={x eM: f(x) <r} #Oand 
b) [f <r] is sequentially compact. 


Then there is a minimizing point xo for f on M: 


f(x) = inf f(2). (33.2) 


Proof We begin by showing indirectly that f is lower bounded. If f is not bounded 
from below there is a sequence (X,)nen Such that f(x,) < —n for alln € N. For 
sufficiently large n the elements of the sequence belong to the set [f < r], hence 
there is a subsequence y; = x, ;) which converges to a point y € M. Since f is 
lower semicontinuous we know f(y) < liminfj_... f(y;), a contradiction since 
Fj) < —n(j) > —oo. We conclude that f is bounded from below and thus has a 
finite infimum: 
-—o <Il=I1(f,M) = int f@) <r. 


Therefore, there is a minimizing sequence (x,),cn whose elements belong to [f < r] 
for all sufficiently large n. Since [f < r] is sequentially compact there is again a 
subsequence y; = x,,;) which converges to a unique point x9 € [f <r]. Since f is 
lower semicontinuous we conclude 


I< f(xo) < liminf f(y;) = lim f(y) =L 
jroo jrow 


Under the condition of the following theorem one has uniqueness of the minimizer: 


Theorem 33.2 (Uniqueness of Minimizer) Suppose M is a convex set in a vector 
space E and f : M > Ris a:strictly convex function on M. Then f has at most one 
minimizing point in M, 


Proof Suppose there are two different minimizing points x9 and yo in M. Since 
M is convex all points x(t) = txo + (1 — t)yo, O < t < 1, belong to M and 
therefore f(xo) = f(vo) < f(x(@)). Since f is strictly convex we know f(x(t)) < 
tf (xo) +d —t) fo) = f (Xo) and therefore the contradiction f(xo) < f (xo). Thus, 
there is at most one minimizing point. 


33.2 Minimization in Banach Spaces 513 


33.2 Minimization in Banach Spaces 


In interesting minimization problems we typically have at our disposal much more 
information about the set M and the function f than we have assumed in Theo- 
rem 33.1. If, for instance, one is interested in minimizing the functional (32.2) one 
would prefer to work in a suitable Banach space of functions, usually a Sobolev 
space. These function spaces and their properties are an essential input for applying 
them in the direct methods. A concise introduction to the most important of these 
function spaces can be found in [1]. 

Concerning the choice of a topology on Banach spaces which is suitable for the 
direct methods (compare our discussion in Chap. 1) we begin by recalling the well- 
known result of Riesz (see Theorem 19.1): The closed unit ball of a normed space 
is compact (for the norm topology) if, and only if, this space is finite dimensional. 
Thus, in infinite-dimensional Banach spaces compact sets have an empty interior, and 
therefore are not of much interest for most purposes of analysis, in particular not for 
the direct methods. Which other topology can be used? Recall that Weierstrass’ result 
on the existence of extrema of continuous functions on closed and bounded sets uses 
in an essential way that in finite dimensional Euclidean spaces a set is compact if, and 
only if, it is closed and bounded. A topology with such a characterization of closed 
and bounded sets is known for infinite dimensional Banach spaces too, the weak 
topology. Suppose EF is a Banach space and E’ is its topological dual space. Then 
the weak topology o = o(E, E’) on E is defined by the system {qu -)iue E'} of 
seminorms q,, q(x) = |u(x)| for all x € E. In most applications one can actually 
use a reflexive Banach space and there the following important result is available. 


Lemma 33.2 In a reflexive Banach space E every bounded set (for the norm) is 
relatively compact for the weak topology o(E, E’). 

A fairly detailed discussion about compact and weakly compact sets in Banach 
spaces, as they are relevant for the direct methods, is given in the Appendix of [2]. 
Prominent examples of reflexive Banach spaces are Hilbert spaces (see Chap. 18), 
the Lebesgue spaces L’? for | < p < oo, and the corresponding Sobolev spaces 
Ww"? m=1,2,...,.1<p<oM. 

Accordingly, we decide to use mainly reflexive Banach spaces for the direct 
methods, whenever this is possible. Then, with the help of Lemma 33.2, we always 
get weakly convergent minimizing sequences whenever we can show that bounded 
minimizing sequences exist. Thus, the problem of lower semicontinuity of the func- 
tional f for the weak topology remains. This is unfortunately not a simple problem. 
Suppose we consider a functional of the form (32.2) and, according to the growth 
restrictions on the integrand F’, we decide to work in a Sobolev space E = W!?(Q) 
or in aclosed subspace of this space, 2 C R¢ open. Typically, the restrictions on F, 
which assure that f is well defined on £, imply that f is continuous (for the norm 
topology). However, the question when such a functional is lower semicontinuous 
for the weak topology is quite involved, nevertheless a fairly comprehensive answer 
is known (see [3]). Under certain technical assumptions on the integrand F the func- 
tional f is lower semicontinuous for the weak topology on E = W!?(Q2) if, and 


514 33 Direct Methods in the Calculus of Variations 


only if, for (almost) all (x,u) € 82 x R” the function y + F(x, u, y) is convex (if 
m = 1), respectively quasiconvex (if m > 1). 

Though in general continuity of a functional for the norm topology does not 
imply its continuity for the weak topology, there is a large and much used class of 
functionals where this implication holds. This is the class of convex functionals and 
for this reason convex minimization is relatively easy. We prepare the proof of this 
important result with a lemma. 


Lemma 33.3 Let E bea Banach space and M a weakly (sequentially) closed subset. 
A function f : M — R is (sequentially) lower semicontinuous on M for the weak 
topology if, and only if, the sublevel sets [f < r] are weakly (sequentially) closed 
for everyr € R. 


Proof We give the proof explicitly for the case of sequential convergence. For the 
general case one proceeds in the same way using nets. 

Let f be weakly sequentially lower semicontinuous and for somer € R let (%,)nen 
be a sequence in [f < r] which converges weakly to some point x € M (since M 
is weakly sequentially closed). By Lemma 33.1 we know f(x) < liminf,_... f(%) 
and therefore f(x) <r,ie.,x €[f <r]. Therefore, [f <r] is closed. 

Conversely assume that all the sublevel sets [f < r], r € R, are weakly 
sequentially closed. Suppose f is not weakly sequentially lower semicontinu- 
ous on M. Then there is a weakly convergent sequence (x,)(C) M with limit 
x € M such that liminf,... f(%,) < f(x). Choose a real number r such that 
lim inf, +00 f(%,) < r < f(x). Then there is a subsequence y; = Xj) C Lf <r]. 
This subsequence too converges weakly to x and, since [f < r] is weakly se- 
quentially closed, we know x € [f < r], a contradiction. We conclude that f is 
sequentially lower semicontinuous for the weak topology. 


Lemma 33.4 Let E be a Banach space, M a convex closed subset and f : M > R 
a continuous convex function. Then f is lower semicontinuous on M for the weak 


topology. 


Proof Because f is continuous (for the norm topology) the sublevel sets [f < r], 
r € R, are all closed. Since f is convex these sublevel sets are convex subsets of 
Ey elf <rh0<t<15 ftxt+U—dy) < tf +0-Df0) < 
tr + (1 —t)r =r). As in Hilbert spaces one knows that a convex subset is closed 
if, and only if, it is weakly closed. We deduce that all the sublevel sets are weakly 
closed and conclude by Lemma 33.3. 
As aconclusion to this section we present a summary of our discussion in the form 
of two explicit results on the existence of a minimizer in reflexive Banach spaces. 


Theorem 33.3 (Generalized Weierstraf Theorem I) A weakly sequentially lower 
semicontinuous function f attains its infimum on a bounded and weakly sequentially 
closed subset M of a real reflexive Banach space E, i.e., there is x9 € M such that 


fo) = inf f(a). 


33.3 Minimization of Special Classes of Functionals 515 


Proof All the sublevel sets [f <r], r € R, are bounded and therefore relatively 
weakly compact since we are in a reflexive Banach space (see Lemma 33.2). Now 
Lemma 33.3 implies that all hypotheses of Theorem 33.1 are satisfied. Thus we 
conclude by this theorem. 

In Theorem 33.3, one can replace the assumption that the set M is bounded by an 
assumption on the function f which implies that the sublevel sets of f are bounded. 
Then one obtains another generalized Weierstrafi theorem. 


Theorem 33.4 (Generalized WeierstraB Theorem ID) Let E be a reflexive Banach 
space, M C E a weakly (sequentially) closed subset, and f : M — Ra weakly 
(sequentially) lower semicontinuous function on M. If f is coercive, i.e., if \|x|| > oo 
implies f(x) — +00, then f has a finite minimum on M, i.e., there isa xq € M 
such that 


f(x) = inf f(x). 


Proof Since f is coercive the sublevel sets [f < r] are not empty for sufficiently 
large r and are bounded. We conclude as in the previous result. 

For other variants of generalized Weierstraf8 theorems we refer to [4]. Detailed 
results on the minimization of functionals of the form (32.2) can be found in [5-7]. 


33.3. Minimization of Special Classes of Functionals 


For a self-adjoint compact operator A in the complex Hilbert space 1 consider the 
sesquilinear function Q : H x H — C defined by Q(x, y) = (x, Ay) + r(x, y) 
for r = ||Al| + c for some c > 0. This function has the following properties: 
Q(x,x) > c |lx||* for all x € H and for fixed x € H the function y b O(x, y) 
is weakly continuous (since a compact operator maps weakly convergent sequences 
onto norm convergent ones). Then f(x) = Q(x,x) is a concrete example of a 
quadratic functional on H which has a unique minimum on closed balls B, of H. This 
minimization is actually a special case of the following result on the minimization 
of quadratic functionals on reflexive Banach spaces. 


Theorem 33.5 (Minimization of Quadratic Forms) Let E be a reflexive Banach 
space and Q a symmetric sesquilinear form on E having the following properties: 
There is a constant c > 0 such that Q(x,x) > ¢||x||? for all x € E and for fixed 
x € E the functional y +» Q(x, y) is weakly continuous on E. Then, for every 
u € E' and everyr > 0, there is exactly one point Xx) = xo(u,r) which minimizes the 
functional 


f(x) = O(x, x) — Reu(x), xeE 


on the closed ball B, = {x € E : ||x|| <r}, ie, 


f(x) = ink f(x). 


516 33 Direct Methods in the Calculus of Variations 


Proof Consider x,y € E and 0 < t < 1, then a straightforward calculation gives 


fxt+Ud—Dy) = tf@)+d—-nfQ)—-td-DQa—y,x—-y) < tf@)+d—-n fy). 


for allx, y € E,x # y, since then t(1 —t)Q(x — y,x — y) > 0, hence the functional 
f is strictly convex and thus has at most one minimizing point by Theorem 33.2. 

Suppose a sequence (x,,),cn in E converges weakly to xp € E. Since Q(xn, Xn) = 
Q(X0, Xo) + Q(%0, Xn — Xo) + On — Xo, X0) + Qn — Xo, Xn — Xo) and since Q is 
strictly positive it follows that 


O(Xn, Xn) = O(X0, Xo) + OXn — X0,X0) + O(%0, Xn — XO) 


for all n € N. Since Q is symmetric and weakly continuous in the second argument 
the last two terms converge to 0 as n —> oo and this estimate implies 


lim inf O(Xn, Xn) = O(Xo, Xo). 
noo 


Therefore, the function x +» Q(x,x) is weakly lower semicontinuous, thus, for 
every u € E',x +> f(x) = QO(x,x) — Reu(x) is weakly lower semicontinuous on 
E and we conclude by Theorem 33.3. (Observe that the closed balls B, are weakly 
closed, as closed convex sets.) 


Corollary 33.1 Let A be a bounded symmetric operator in complex Hilbert space 
H which is strictly positive, i.e., there is a constant c > 0 such that (x, Ax) > c(x,x) 
for all x € H. Then, for every y € H the function x +> f(x) = (x, Ax) — Re (y, x) 
has a unique minimizing point x9 = xo(y,r) on every closed ball B,., i.e., there is 
exactly one xo € B, such that 


f (xo) = int SO). 


Proof Using the introductory remark to this section one verifies easily that Q(x, y) = 
(x, Ay) satisfies the hypothesis of Theorem 33.5. 


33.4 Exercises 


1. Prove Lemma 33.1. 

2. Show without the use of Lemma 33.3 that the norm ||-|| on a Banach space E is 
weakly lower semicontinuous. 
Hints: Recall that ||xo|| = SUP yee’, |Iu)|/<1 |u(xo)| for xo € E. Ifa sequence (x, )nen 
converges weakly to xo, then for every u € E’ one knows u(xo) = limy-so0 U(Xn)- 

3. Prove: The functional 


1 
fu) = i (uly dt, 
0 


defined on all continuous functions on [0, 1] which have a weak derivative u' € 
L7(0, 1) and which satisfy u(0) = 0 and u(1) = 1, has 0 as infimum and there is 
no function in this class at which the infimum is attained. 


References 517 


4. On the space E = C!({ — 1, 1], R) define the functional 


1 
flu) = I (tu'(t))’ dt 
-1 


and show that it has no minimum under the boundary conditions u( + 1) = +1. 
Hint: This variation of the previous problem is due to WeierstraB. Show first that 
on the class of functions u,, ¢ > 0, defined by 


arctan = 
[> 


Uue(x) = ——— 
arctan = 


the infimum of f is zero. 


References 

1. Lieb E, Loss M. Analysis. Graduate studies in mathematics, vol 14. 2nd ed. Providence: 
American Mathematical Society; 2001. 

2. Blanchard Ph, Briining E. Variational methods in mathematical physics. A unified approach. 
Texts and monographs in physics. Berlin: Springer; 1992. 

3. Dacorogna B. Weak continuity and weak lower semicontinuity of non-linear functionals. Lecture 
notes in mathematics, vol 922. Berlin: Springer; 1982. 

4. Zeidler E. Variational methods and optimization. Nonlinear functional analysis and its 
applications, vol 3. New York: Springer; 1985. 

5. Dacorogna B. Direct methods in the calculus of variations. Applied mathematical sciences, 
vol 78. Berlin: Springer; 1989. 

6. Jost J, Li-Jost X. Calculus of variations. Cambridge studies in advanced mathematics, vol 64. 
Cambridge: Cambridge University Press; 1998. 

7. 


Struwe M. Variational methods: applications to nonlinear partial differential equations and 
Hamiltonian systems. Ergebnisse der Mathematik und ihrer Grenzgebiete, Folge 3, vol 34. 
3rd ed. Berlin: Springer; 2000. 


Chapter 34 
Differential Calculus on Banach Spaces 
and Extrema of Functions 


As is well known from calculus on finite-dimensional Euclidean spaces, the behavior 
of a sufficiently smooth function f in a neighborhood of some point xo is determined 
by the first few derivatives f“ (xo), n < m, of f at this point, m ¢ N depending on 
f and the intended accuracy. For example, if f is a twice continuously differentiable 
real-valued function on the open interval §2 C R and xo € &, the Taylor expansion 
of order 2 


= () 1 2g) 2 2 
f(x) = f(xo) + FY’ ro)(% — x0) + ant (xo)(x — x0)" + (% — x0)” Ro(x, xo) 
(34.1) 


with lim,-,,,. Ro(x, xo) = 0 is available, and on the basis of this representation the 
values of f‘ (xo) and f(x) determine whether xq is a critical point of the function 
f, or a local minimum, or a local maximum, or an inflection point. 

In variational problems too, one has to determine whether a function /f has crit- 
ical points, local minima or maxima or inflection points, but in these problems the 
underlying spaces are typically infinite-dimensional Banach spaces. Accordingly an 
expansion of the form (34.1) in this infinite-dimensional case can be expected to be 
an important tool too. Obviously one needs differential calculus on Banach spaces 
to achieve this goal. 

Recall that differentiability of a real-valued function f on an open interval £2 at 
a point x9 € §2 is equivalent to the existence of a proper tangent to the graph of 
the function through the point (xo, f(xo)) € IR®. A proper tangent means that the 
difference between the values of the tangent and of the function f at a point x € Q 
is of higher order in x — xo than the linear term. Since the tangent has the equation 
y(x) = f(xo)(x — xo) + f (xo) this approximation means 

Fx) — yx) = FO) — FO Coda — x0) — Fo) = o(% — x0), (34.2) 
where o is some function on R with the properties 0(0) = o and limy_.0,40 ee In 
the case of a real-valued function of several variables, the tangent plane takes the role 


of the tangent line. As we are going to show, this way to look at differentiability has 
a natural counterpart for functions defined on infinite-dimensional Banach spaces. 


© Springer International Publishing Switzerland 2015 519 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_34 


520 34 Differential Calculus on Banach Spaces and Extrema of Functions 


34.1 The Fréchet Derivative 


Let E, F be two real Banach spaces with norms ||-||- and ||-||-, respectively. As 
usual £(E, F) denotes the space of all continuous linear operators from E into F. 
By Theorem 22.2, the space £(E, F) is a real Banach space too. The order symbol 
o denotes any function E — F which is of higher than linear order in its argument, 
i.e., any function satisfying 

lo™)|lr 


a) =0, h->0,heE\ 0) ||Alle oe 


Definition 34.1 Let U C E be a nonempty open subset of the real Banach space E 
and f : U — F a function from U into the real Banach space F’. f is called Fréchet 
differentiable at a point x) € U if, and only if, there is an 2 € L(E, F) such that 


F(x) = f(xo) + L& — x0) + o(%0; x — Xo) VxeU. (34.4) 


If f is differentiable at x9 € U the continuous linear operator  € L(E, F) is called 
the derivative of f at xo and is denoted by 


f'(%0) = Duy f = Df (xo) = &. (34.5) 


If f is differentiable at every point xo € U, f is called differentiable on U and the 
function Df : U — L(E, F) which assigns to every point x9 € U the derivative 
Df (xo) of f at xo is called the derivative of the function /f. 
If the derivative Df : U —> L(E, F) is continuous, the function f is called 
continuously differentiable on U or of class C!, also denoted by f € C!(U, F). 
This definition is indeed meaningful because of the following 


Lemma 34.1 Under the assumptions of Definition 34.1 there is at most one £ € 
L(E, F) satisfying Eq. (34.4). 


Proof Suppose there are £1, 22 € L(E, F) satisfying Eq. (34.4). Then, forall h € B, 
where B, denotes an open ballin EF with center 0 and radiusr > 0 such that x9 + B, C 
U, we have f(xo) + €1(A) + 01(x0,2) = fo +h) = fo) + 2h) + o2(%0, h), 
and hence the linear functional € = £2 — £, satisfies £(h) = 01(x0, h) — 02(Xo, h) 
for all h € B,. A continuous linear operator can be of higher than linear order on an 
open ball only if it is the null operator (see Exercises). This proves € = 0 and thus 
uniqueness. 

Definition 34.1 is easy to apply. Suppose f : U — F is constant, i.e., for some 
a € F wehave f(x) =a forallx ¢ U C E. Then f(x) = f(x) for all x, x9 € U 
and with the choice of £ = 0 € L(E, F) condition (34.4) is satisfied. Thus, f is 
continuously Fréchet differentiable on U with derivative zero. 

As another simple example consider the case where E is some real Hilbert space 
with inner product (-,-) and F = R. For a continuous linear operator A: E > E 
define a function f : E — R by f(x) = (x, Ax) for all x € E. Forx,h € E we 


34.1 The Fréchet Derivative 521 


calculate f(x +h) = f(x)+ (A*x+Ax,h)+ f(h). hw (A*x+ Ax,h) is certainly 
a continuous linear functional E > R and f(h) = o(h) is obviously of higher than 
linear order (actually second order) in h. Hence, f is Fréchet differentiable on E 
with derivative f’(x) € L(E,R) given by f'(x)(A) = (A*x + Ax,h) forallh € E. 

In the Exercises, the reader will be invited to show that the above definition of dif- 
ferentiability reproduces the well-known definitions of differentiability for functions 
of finitely many variables. 

The Fréchet derivative has all the properties which are well known for the 
derivative of functions of one real variable. Indeed the following results hold. 


Proposition 34.1 Let U C E be an open nonempty subset of the Banach space E 
and F some other real Banach space. 


a) The Fréchet derivative D is a linear mapping C\(U, F) > C(U, F), i.e., for all 
f.g €C'(U, F) and all a,b € R one has 
Daf +bg)=aDf + bDg. 


b) The chain rule holds for the Fréchet derivative D: Let V C F be an open set 
containing f(U) and G a third real Banach space. Then for all f € C\(U, F) 
and all g € C'(V, G) we have go f € C'(U,G) and for all x € U 


D(g o fix) = (Daf) 0° (DIA). 


Proof The proof of the first part is left as an exercise. 
Since f is differentiable at x ¢ U we know 


feth)— f@)=f'@h)tolh) Whe B, x+B,CU 
and similarly, since g is differentiable at y = f(x) € V, 


giv +k)— g(y) = gk) +0n(kk) =VKEB,, y+B,CV. 


Since f is continuous one can find, for the radius p > 0 in the differentiability con- 
dition for g, a radius r > 0 such that f(B,) C B, and such that the differentiability 
condition for f holds. Then, for all h € B,, the following chain of identities holds, 
taking the above differentiability conditions into account: 
gof(xth)—go fx)=slfe+h)]— slf@)] 

= glf(x) + f’@yh) + o1(h)] — gl f@)I 

= gi (y\ f(A) + 01(h)) + 02 f(x) + 01(A)) 

= s(y(f'(a)(h)) + o(h), 
where 


o(h) = g'(y)(oi(h)) + o2(f' (xh) + 01(h)) 


is indeed a higher order term as shown in the Exercises. Thus, we conclude. 


522 34 Differential Calculus on Banach Spaces and Extrema of Functions 


Higher order derivatives can be defined in the same way. Suppose E, F are two real 
Banach spaces and U C E is open and nonempty. Given a function f € C!(U, F) 
we know f’ € C(U, L(E, F)), i.e., the derivative is a continuous function on U with 
values in the Banach space L(E, F). If this function f’ is differentiable at x9 € U 
(on U), the function f is called twice differentiable at x» € U (on U) and is denoted 
by 


DP FQ) =F" Go) = Dif = DG iGo): (34.6) 


According to Definition 34.1 and Eq. (34.6), the second derivative of f : U > 
F is a continuous linear operator E —> L(E, F), i.e., an element of the space 
L(E, L(E, F)). There is a natural isomorphism of the space of continuous linear 
operators from EF into the space of continuous linear operators from E into F and 
the space B(E x E, F) of continuous bilinear operators from E x E into F, 


L(E, L(E, F)) = BE x E, F). (34.7) 


This natural isomorphism is defined and studied in the Exercises. Thus, the second 
derivative D* Ff (xo) at a point x9 € U is considered as a continuous bilinear map 
E x E — F. If the second derivative D* f :U > B(E x E, F) exists on U and is 
continuous, the function f is said to be of class C* and we write f € C?(U, F). 
The derivatives of higher order are defined in the same way. The derivative of order 
n > 3 is the derivative of the derivative of order n — 1, according to Definition 34.1: 


D" f (xo) = D(D""! f)(x0). (34.8) 


In order to describe D” f(x9) conveniently we extend the isomorphism (34.7) to 
higher orders. Denote by EX” = E x --- x E (n factors) and by B(E*”, F) the 
Banach space of all continuous n-linear operators E*” — F. In the Exercises, one 
shows for n = 3,4,... 


L(E, B(E*""', F)) X B(E™", F). (34.9) 


Under this isomorphism the third derivative at some point x9 € U is then acontinuous 
3-linear map EC > F, D? f (x0) € B(E*?, F). Using the isomorphisms (34.9), the 
higher order derivatives are 


D" f(xo) € BE™ , F) (34.10) 


if they exist. If D” f : U — B(E*", F)is continuous the function f is called n-times 
continuously differentiable or of class C”. Then we write f € C"(U, F). 

As an illustration we calculate the second derivative of the function f(x) = 
(x, Ax) on areal Hilbert space E with inner product (-,-), A a bounded linear operator 
on E. The first Fréchet derivative has been calculated, f’(xp)(h) = ((A+A*)x9, y) for 
all y € E. Inorder to determine the second derivative we evaluate f’ (xp +h)— f’(x0). 
For all y € E one finds through a simple calculation 


(f(x +h) — f'%o))(y) = (A + A*)h, y). 


Hence, the second derivative of f exists and is given by the continuous bilinear form 
(D? f )(xo)(y1, y2) = ((A + A*)y1, y2) . V1, y2 € E. We see in this example that the 


34.1 The Fréchet Derivative 523 


second derivative is actually a symmetric bilinear form. With some effort this can be 
shown for every twice differentiable function. 

As we have mentioned, the first few derivatives of a differentiable function f : 
U — F ata point xo € U control the behavior of the function in a sufficiently small 
neighborhood of this point. The key to this connection is the Taylor expansion with 
remainder. In order to be able to prove this fundamental result in its strongest form 
we need the fundamental theorem of calculus for functions with values in a Banach 
space. And this in turn requires the knowledge of the Riemann integral for functions 
on the real line with values in a Banach space. 

Suppose F is a real Banach space and u : [a,b] — E a continuous function on 
the bounded interval [a, b]. In Section 27.2 on the integration of spectral families, 
we had introduced partitions Z of the interval [a,b]. Roughly, a partition Z of the 
interval [a,b] is an ordered family of points a = f < t) < tp <---<t, =bD 
and of some points t € (tj-1,t;], j = 1,...,n. For each partition we introduce the 


approximating sums 
n 


Du, Z) = CNG: — tj-1). 


j=l 


By forming the joint refinement of two partitions one shows, as in Sect. 26.2 on the 
integration of spectral families, the following result: Given e > 0 there is 6 > O such 
that 

|| Sq, Z)— Lu, Z|, <e 
for all partitions Z, Z’ with |Z'|,|Z| < 6, |Z| = max {tj Shai 7 Sess sn}. 
This estimate implies that the approximating sums »'(u, Z) have a limit with respect 


to partitions Z with |Z| —> 0. 


Theorem 34.1 Suppose E is a real Banach space and u : [a,b] > E acontinuous 
function. Then u has an integral over this finite interval, defined by the following 
limit in E: 


b 
i u(t)dt = lim Lu, Z). (34.11) 
, |Z|+0 
This integral of functions with values in a Banach space has the standard properties, 


i.e., itis linear in the integrand, additive in the interval of integration, and is bounded 
by the maximum of the function multiplied by the length of the integration interval: 


b 
|/ u(t)dt 


Proof \tis straightforward to verify that the approximating sums 2(u, Z) are linear 
in u and additive in the interval of integration. The basic rules of calculation for limits 


< (b—a)max |lu@|lz. 
E a<t<b 


524 34 Differential Calculus on Banach Spaces and Extrema of Functions 


then prove the statements for the integral. For the estimate observe 


|Z, Ze < d- |lu(tlle(; — tj-1) < sup |lu(a)lle Yo (tj - t)-1), 
a<t< 


j=l j=l 


which implies the above estimate for the approximating sums. Thus we conclude. 


Corollary 34.1 (Fundamental Theorem of Calculus) Let E be a real Banach 
space, [a,b] a finite interval and u : [a,b] > E a continuous function. For some 
e € E define a function v : [a,b] > E by 


wy =ert | us)ds VWs €[a,b]. (34.12) 


Then v is continuously differentiable with derivative v'(t) = u(t) and one thus has 
foralla<c<d<b, 


d 
v(d) — v(c) = ‘| v(t) dt. (34.13) 


Proof We prove differentiability of v at some interior point t € (a,b). At the 
end points of the interval the usual modifications apply. Suppose t > 0 such that 
t+ € [a,b]. Then, by definition of v, 


t+Tt t t+T 
wi+n-vo= | us)as— | woyas = [ u(s)ds 


a a t 


t+t t t+T 
i uoyas = [ us)ds+ f u(s)ds. 


The basic bound for integrals gives 


and thus proves that this integral is of higher order in t. We deduce v(t + tT) = 
v(t)+tu(t)+o(t) and conclude that v is differentiable at t with derivative v’(t) = u(t). 
The rest of the proof is standard. 


since 


<t sup |lu(s)— ule 
E t<s<t+t 


t+T 
/ [u(s) — u(t)]ds 


Theorem 34.2 (Taylor Expansion with Remainder) Suppose E, F are real Ba- 
nach spaces, U C E an open and nonempty subset, and f € C"(U,F). Given 
xo € U choose r > 0 such that x» + B, C U, where B, is the open ball in E 
with center 0 and radius r. Then for all h € B, we have, using the abbreviation 
(hy = (h,... ,h), k terms, 


n 


1 
Fao +h) = YP SOCOMAY + Ralxo5 h), (34.14) 
k=0 


34.1 The Fréchet Derivative 525 


where the remainder R,, has the form 


1 
Rawea = if (12 Lf (ay + th) — fap Mhy' dt (34.15) 
@—D! Jy 


and thus is of order o((h)"), i.e., 


h—0,heE\(0) Alle 


Proof Basically the Taylor formula is obtained by applying the fundamental theorem 
of calculus repeatedly (n times) and transforming the multiple integral which is 
generated in this process by a change of the integration order into a one-dimensional 
integral. 

However, there is a simplification of the proof based on the following observation 
(see [1]). Let v be a function on [0, 1] which is n times continuously differentiable, 
then 
(1 a ty’! 


n—-1 

d -1 @ (n) 

a = ; Vv 1]. 
Ge HO Gay TO 


The proof of this identity follows simply by differentiation and grouping terms 
together appropriately. 

Integrate this identity for the function v(t) = f(xp + th). Since f € C"(U, F) the 
application of the chain rule yields for h € B,., 


vr) = fO(xo + thy(ny 


and thus the result of this integration is, using Eq. (34.13), 


n—1 


1 
Fao +h) = DP FOCoMAE + R 


k=0 °° 


with remainder 
1 1 
R= — | (1 — 2)! (xo + thy(hy" dt 
(n — 1)! Jo 
which can be written as 


_ i (n) n 1 : _ 4yn-lyp ¢) _ ¢(n) n 
R= if (x0)(h) ei! A = 1) [Lf ' Go + th) — f°" (xo) hy" dt. 


n 
The differentiability assumption for f implies that the function h > f (x9 4+ th) 
from B, into B(E*", F) is continuous, hence 


| LF (xo + th) _ FC) | B(E*",F) >0 


as h — 0. Thus we conclude. 


526 34 Differential Calculus on Banach Spaces and Extrema of Functions 


34.2 Extrema of Differentiable Functions 


Taylor’s formula (34.14) says that a function f : U > F ofclassC” is approximated 
at each point of a neighborhood of some point x9 € U by a polynomial of degree n, 
and the error is of order o((x — xo)"). We apply now this approximation for n = 2 to 
characterize local extrema of a function of class C? in terms of the first and second 
derivative of f. We begin with the necessary definitions. 


Definition 34.2 Let E be a real Banach space, M C E a nonempty subset, and 
f :M— Rareal valued function on M.A point x9 € M is called a local minimum 
(maximum) of f on M if there is some r > 0 such that 


f(xo) < f@), (fOo0)2 fa) Wx EeMN(o+ B,). 


A local minimum (maximum) is strict if 


f (x0) < f@), (fo) > f@)) VxEeMNQot B), x # Xo. 


If f(xo) < f@), (fo) = f(x)) holds for all x € M, we call xo a global minimum 
(maximum). 


Definition 34.3 Suppose E,F are two real Banach spaces, U C E an open 
nonempty subset, and f : U > F a function of class C!. A point xo € U is called 
a regular (critical) point of the function f if, and only if, the Fréchet derivative 
Df (xo) of f at xo is surjective (not surjective). 


Remark 34.1 For the case F = R the Fréchet derivative Df(xo) = f’(%o) € 
L(E,R) is not surjective, if and only if, f’(x9) = 0; hence, the notion of a critical 
point introduced above is nothing else than the generalization of the corresponding 
notion introduced in elementary calculus. 

For extremal points which are interior points of the domain M of the function f 
a fairly detailed description can be given. In this situation, we can assume that the 
domain M = U is an open set. 


Theorem 34.3 (Necessary Condition of Euler-Lagrange) Suppose U is an open 
nonempty subset of the real Banach space E and f € C'(U,R). Then every extremal 
point (i.e., every local or global minimum and every local or global maximum) is a 
critical point of f. 


Proof Suppose that x9 € U is a local minimum of f. Then there is anr > O such 
that xo + B, C U and f(xo) < f(xo +A) forallh € B,. Since f €¢ C'(U,R) Taylor’s 
formula applies, thus 


f(%0) = fo +h) = f%o) + f’@oA) + RiGo.t) Whe B, 


or 
O< f'(xo)(A) + Rio,h) = -Vh ee B,. 


Choose any h € B,, h # 0. Then all th € B,, 0 < t < 1 and therefore 0 < 
f'(xo(th)+ Ri (xo, th). Since lim,_,9 t7! R1 (xo, th) = 0 we can divide this inequality 


34.2 Extrema of Differentiable Functions 527 


by t > 0 and take the limit t — 0. This gives 0 < f’(x9)(h). This argument applies 
to any h € B,, thus in particular to —A and therefore 0 < f’(xo)(—h) = —f'(xo)(A). 
We conclude that 0 = f’(xo)(A) for all h € B,. The open nonempty ball B, absorbs 
the points of E, i.e., every point x € E can be written as x = Ah with some 
h € B, and some i € R. It follows that 0 = f’(x9)(x) for all x € E and therefore 
fo) =0€ L(E,R) = E’. 

If x9 € U isa local maximum of /, then this point is a local minimum of — f and 
we conclude as above. 


Theorem 34.4 (Necessary and Sufficient Conditions for Local Extrema) Sup- 


pose U C E is a nonempty open subset of the real Banach space E and f € 
C?(U,R). 


a) If f has alocal minimum at xo € U, then the first Fréchet derivative of f vanishes 
at xo, f'(x0) = 0, and the second Fréchet derivative of f is nonnegative at xo, 
f(xo)(h, hh) = 0 for allh € E. 

b) If conversely f'(xo) = 0 and if the second Fréchet derivative of f is strictly 
positive at xo, i.e., ifinf { f(A, h) the E, |lhAlle = 1} =c > 0, then f has 
a local minimum at xo. 


Proof Suppose xo € U isalocal minimum of f. Then by Theorem 34.3 f’(x9) = 0. 
Since f € C?(U,R) Taylor’s formula implies 


f (x0) < fo + h) = f(xo) + sf Maoh) + Ro(xo,h) Whe B, (34.16) 


for some r > 0 such that x9 + B, C U. Choose any h € B,. Then for all0 <t < 1 
we know 0 < 3 f(xo)(th, th) + Ro(xo, th) or, after division by 1? > 0. 


2 
0 < f(xo)(h, h) + SRo(xoth) VO<t<1. 


Since R2(x%o, th) is a higher order term we know t-?Ro(xp, th) > Oast > 0. 
This gives 0 < f (xq)(h,h) for all h € B, and since open balls are absorbing, 
0 < f™(xo)(h, h) for all h € E. This proves part (a). 

Conversely assume that f’(xo) = 0 and that f(x) is strictly positive. Choose 
r > Osuch that x9 + B, C U. The second-order Taylor expansion gives 


1 
Fao +h) — fo) = | FO )(h A) + Ro(xo,h) Whe B,, 
and thus for all h € E with ||h||; = l andallO <s <r, 
1 
f (xo + sh) — f (x0) = af Crov(sh, sh) + Rox, sh) 


7 Py fMaomh, h) + s~? Ro(xo, sh)]. 


Since R (xo, sh) is a higher order term there is an sg € (0,r) such that 
|s~? Ro(xo, sh)| < c/2 for all 0 < s < so, and since + f(xo)(h, h) > c/2 for all 


528 34 Differential Calculus on Banach Spaces and Extrema of Functions 


he E, |\|h||z = 1, we get [af Ooh, A) + s~?R5(xo, sh)] = 0 for all O < 5 < 59 
and allh € E, ||h||~ = 1. It follows that f(xo + h) — f(xo) = 0 for all h € B,, and 
therefore the function f has a local minimum at xo. 

As we mentioned before a function f has a local maximum at some point xo if, 
and only if, the function — f has a local minimum at this point. Therefore, Theorem 
34.4 easily implies necessary and sufficient conditions for a local maximum. 


34.3 Convexity and Monotonicity 


We begin with the discussion of an interesting connection between convexity of a 
functional and monotonicity of its first Fréchet derivative which has far-reaching 
implications for optimization problems. For differentiable real-valued functions of 
one real variable these results are well known. 

The following theorem states this connection in detail and provides the relevant 
definitions. 


Theorem 34.5 (Convexity-Monotonicity) Let U be a convex open subset of the 
real Banach space E and f € C\(U,R). Then the following statements are 
equivalent: 


a) f is convex, i.e., forallx,y € U andall0 < t < 1 one has 


fitx +d —dy) s tf@) +d -—HfO) (34.17) 


b) The Fréchet derivative f' : E — E’ is monotone, i.e., for all x, y € U one has 


(f(x) — f(y), x — y) = 0, (34.18) 


where (-,-) denotes the canonical bilinear form on E' x E. 


Proof If f is convex inequality (34.17) implies, for x, y ¢ U andO <tr <1, 


FQ +t@— y)— fO) S tf@) +0 -—OfO)— FO) = tF@) — FO). 


If we divide this inequality by ¢ > 0 and then take the limit t — 0 the result is 


(f(y), x — y) S FO) — f(). 


If we exchange the roles of x and y in this argument, we obtain 


(fx), y — x) S FQ) — FG). 


Now add the two inequalities to get 


(PO) 2= 9) + (FO) y — 3) 30, 


thus condition (34.18) follows and therefore f’ is monotone. 


34.3 Convexity and Monotonicity 529 


Suppose conversely that the Fréchet derivative f’ : E — E’ is monotone. For 
x,y € U andO < t < 1 consider the function p : [0,1] — R defined by p(t) = 
f@x+d—-ty)—-tf(x)—U —1t)f (0). This function is differentiable with derivative 
pit) = (f'(@@)).x — y) — f@®) + fQ), x@) = tx + (1 — dy, and satisfies 
p(O) = 0 = p(i). The convexity condition is equivalent to the condition p(t) < 0 
for all t € [0, 1]. We prove this condition indirectly. Thus, we assume that there is 
some point in (0, 1) at which p is positive. Then there is some point fo € (0, 1) at 
which p attains its positive maximum. For rt € (0, 1) calculate 


(t — to)(p'(t) — p'(to)) = (t — tof’) — fro), x — y) 
= (f(t) — f’(x(to)), x(t) — x(to)). 
Since f’ is monotone it follows that (t — to)(p’(t) — p’(to) = 0. Since p attains 


its maximum at fo, p’(to) = 0, and thus (t — fo) p’(t) = 0, hence p’(t) > 0 for all 
to < t < 1, a contradiction. We conclude p(t) < 0 and thus condition (34.17). 


Corollary 34.2. Let U be a nonempty convex open subset of the real Banach space 
E and f € C\(U,R). If f is convex, then every critical point of f is actually a 
minimizing point, i.e., a point at which f has a local minimum. 


Proof If xo € U is acritical point, there is anr > 0 such that x9 + B, C U. Then 
for every h € B, the points x(t) = x9 + th,O < t < 1, belong to x9 + B,. Since f 
is differentiable we find 


1 d 1 
f(%o +h) — f (x0) = [ qt ede = / (f(a), A) dt. 
Since x(t) — x9 = th the last integral can be written as 
Pot ; dt 
= tim [ (fF (x(t) — fo), xt) — x0) 


Theorem 34.5 implies that the integrand of this integral is nonnegative, hence f (x9 + 
h)— f(xo) = Ofor allh € B, and f has alocal minimum at the critical point xo. 


Corollary 34.3. Let U be a nonempty convex open subset of the real Banach space 
E and f €C'(U,R). If f is convex, then f is weakly lower semicontinuous. 


Proof Suppose that a sequence (x,),cen C U converges weakly to x) € U. Again 
differentiability of f implies 
1 d 1 
Fes) — f.0) =f Fao + 104 — oa = fF G0 tty — sa) — 0) dt 
0 0 


1 
=) (f' (xo ae t(Xn ~ Xo0)) — f (x0), Xn _ Xo) dt a (f' (x0), Xn — Xo). 


530 34 Differential Calculus on Banach Spaces and Extrema of Functions 


As in the proof of the previous corollary, monotonicity of f’ implies that the integral 
is not negative, hence 


f Gn) — f(x) = (F'(x0), Xn — X0)- 


Asn — oo the right-hand side of this estimate converges to 0 and thus lim inf f(x,)— 
f(xo) = 0 or liminf,; +0 f(xn) = f(xo). This shows that f is weakly lower 
semicontinuous at x9. Since x9 € U was arbitrary, we conclude. 


Corollary 34.4 Let U be a nonempty convex open subset of the real Banach space 
E and f € C?(U,R). Then f is convex if, and only if, f(xo) is nonnegative for all 
xo € U,ie., f(xo)(h, h) > Oforalih e E. 


Proof By Theorem 34.5 we know that f is convex if, and only if, its Fréchet 
derivative f’ is monotone. Suppose f’ is monotone and x) € U. Then there is an 
r > Osuch that x9 + B, C U and (f’(x9 +h) — f’(x0), 2) = 0 for all h € B,. Since 
f €C?(U,R), Taylor’s Theorem implies that 


(f' (xo +) — f’(X0),h) = fF (xo)(A, h) + Ro(xo, h), 


hence 
0 < f(xo)(h,h) + Ro(xo,h) = Wh E B,. 


Since Ro(xo,h) = o((h)*) we deduce, as in the proof of Theorem 34.4, that 0 < 
f(xo)(h, h) for all hh € E. Thus f™ is nonnegative at x € U. 
Conversely assume that f is nonnegative on U. For x, y € U we know 


it 
PS =7he-5= / Cee Cm) cme eee res 


By assumption the integrand is nonnegative, and it follows that f’ is monotone. 


34.4 Gdateaux Derivatives and Variations 


For functions f : R" — R one has the concepts of the total differential and that of 
partial derivatives. The Fréchet derivative has been introduced as the generalization 
of the total differential to the case of infinite-dimensional Banach spaces. Now we 
introduce the Gateaux derivatives as the counterpart of the partial derivatives. 


Definition 34.4 Let E, F be two real Banach spaces, U C E a nonempty open 
subset, and f : U + F a mapping from U into F. The Gateaux differential of f 
at a point x) € U is a mapping 5f(xo,-) : E — F such that, for allh € E, 


1 
Fast (Ff @o-+ th) — fo) = 6F Go, ). (34.19) 


34.4 Gateaux Derivatives and Variations 531 


df (xo, h) is called the Gateaux differential of f at the point x, in the direction 
h € E. If the Gateaux differential of f at xo is a continuous linear map E — F, one 
writes 


df (0,2) = bx f(A) 


and calls 5,, f the Gateaux derivative of f at the point xo. 
Basic properties of the Gateaux differential, respectively derivative, are collected 
in the following 


Lemma 34.2 Let E, F be two real Banach spaces, U © E anonempty open subset, 
and f :U — F amapping from U into F. 


a) If the Gateaux differential of f exists at a point xo € U, it is a homogeneous map 
E > F,i.e., df (xo, Ah) = ASf (x0, h) for allX € Randallh € E; 

b) If the Gateaux derivatives exist at a point x € U, they are linear in f, i.e., for 
f.g:U — F anda,B € R one has 6,(af + Bg) = ad, f + Bdyg; 

c) If f is Gateaux differentiable at a point x € U, then f is continuous at x in every 
directionh € E; 

d) Suppose G is a third real Banach space, V © F a nonempty open subset such 
that f(U) © V and g : V > Gamapping from V into G. If f has a Gateaux 
derivative at x € U and g has a Gdteaux derivative at y = f(x), then go f : 
U — G has a Gateaux derivative at x € U and the chain rule 


bx(g 0 f) = dygodxf 
holds. 


Proof Parts (a) and (b) follow easily from the basic rules of calculation for limits. 
Part (c) is obvious from the definitions. The proof of the chain rule is similar but 
easier than the proof of this rule for the Fréchet derivative and thus we leave it as an 
exercise. 

The following result establishes the important connection between Fréchet and 
Gateaux derivatives, as a counterpart of the connection between total differential and 
partial derivatives for functions of finitely many variables. 


Lemma 34.3 Let E, F be two real Banach spaces, U © E anonempty open subset, 
and f : U — F amapping from U into F. 


a) If f is Fréchet differentiable at a point x € U, then f is Gdteaux differentiable 
at x and both derivatives are equal: 5, f = D, f. 

b) Suppose that f is Gateaux differentiable at all points in a neighborhood V of 
the point xy € U and that x > 6, f € L(E, F) is continuous on V. Then f is 
Fréchet differentiable at xo and 6, f = Dy f. 


Proof If f is Fréchet differentiable at x ¢ U we know, for allh ¢ E, f(x + th) = 
f(x) + (Dy f(th) + o(th), hence 


eal _ o(th) 
lim —(f(@ + th) ~ f(x) = (Dx fh) + tim —— = (Dy fV(h), 


532 34 Differential Calculus on Banach Spaces and Extrema of Functions 


and part (a) follows. 

If f is Gateaux differentiable in the neighborhood V of x9 € U, there is an 
r > 0 such that f is Gateaux differentiable at all points x9 +h, h € B,. Given 
h € B, it follows that g(t) = f(xo + th) is differentiable at all points ¢ € [0, 1] and 
g(t) = (8xo4th fh). This implies 


el) — 90) = / ‘gat = / " Gepan f(A) dt 
and thus 
F(x +A) — f(0) — Bry fA) = gC) — 80) — Bry AH) 
= i “[Bmotin F(A) — By UO] AE. 


The integral can be estimated in norm by 


Sap xref) — Oro AD ce py Walle 
sts 


and therefore 


|| fxo +h) — F(%0) — 6x) AA) ||, < sup |] Gxoten £) — Oxo AD cc ry Walle - 


Continuity of (6, f) in x € xo + B, implies f(x) + h) — f(x) — (xy) f(A) = o(h) 
and thus f is Fréchet differentiable at x9 and (D,, f)(h) = (Sx) f(A) for all h € B, 
and therefore for allh € E. 

Lemma 34.3 can be very useful in finding the Fréchet derivative of functions. We 
give a simple example. On the Banach space E = L?(R"), 1 < p < 2, consider the 
functional 


fWw= / |u(x)|? dx, Vue E. 
Re 


To prove directly that f is continuously Fréchet differentiable on F is not so simple. 
If, however, Lemma 34.3 is used the proof becomes a straightforward calculation. 
We only need to verify the hypotheses of this lemma. In the Exercises the reader is 
asked to show that there are constants 0 < c < C < oo such that 

cls|? < |1+s|? —1— ps < C|s|? VseER. 
Insert s = 1m, for all points x € IR” with u(x) 4 0 and multiply with |u(x)|?. The 
result is 


clth(x)|? < lux) + tha)? = |u|? — pth(x)|u(x)|?~'sgn(u(x)) < Clth(x)|?. 


Integration of this inequality gives 


elt’ f(h) <= flut+ th) — fw — pt f A(x)v(x) dx < Clt|? f(A), 
R" 


34.4 Géateaux Derivatives and Variations 533 


where 
v(x) = |u(x)|?~'sgn(u(x)). 


Note that v € L7(R”), ; + 7 = | and that L7(R”) is (isomorphic to) the topological 
dual of E = L?(R"). This estimate allows us to determine easily the Gateaux 
derivative of f: 


1 
5 f(u,h) = lim =Lf(u+ th) — FQ] = p [ _veayh(x) dx, 


Ho6lder’s inequality implies that the absolute value of this integral is bounded by 
Ivilg lll,» hence h +> 6f(u, h) is a continuous linear functional on E and 


Wouf lice = Wig = lip 


Therefore, u +» 6,f is a continuous map from E — L(E,R) and Lemma 34.3 
implies that f is Fréchet differentiable with derivative D, f(h) = 6, f(A). 

Suppose that M is a nonempty subset of the real Banach space EF which is not 
open, for instance M has a nonempty interior and part of the boundary of M belongs 
to M. Suppose, furthermore, that a function f : M — R attains a local minimum at 
the boundary point x9. Then we cannot investigate the behavior of f in terms of the 
first few Fréchet or Gateaux derivatives of f at the point x9 as we did previously since 
this required that a whole neighborhood of x9 is contained in M. In such situations 
the variations of the function in suitable directions are a convenient tool to study the 
local behavior of f. 

Assume thath € E and thatthereissomer = r, > Osuchthat x(t) = x9 t+th € M 
for all 0 < t < r. Then we can study the function f;,(t) = f(x(f)) on the interval 
[0,r). Certainly, if f has a local minimum at xo, then f,(0) < f,(¢) for all t € [0,7r) 
(if necessary we can decrease the value of r) and this gives restrictions on the first 
few derivatives of f),, if they exist. These derivatives are then called the variations 


of f. 


Definition 34.5 Let M C E bea nonempty subset of the real Banach space E and 
xq € M. Forh ¢€ E suppose that there is an > 0 such that x9 + th € M for all 
0 <t <r. Then the nth variation of f in the direction h is defined as 


n 


dt” 


A" f (x0, h) = f@o+thiino n=1,2,... (34.20) 
if these derivatives exist. 
In favorable situations obviously the first variation is just the Gateaux derivative: 


Lemma 34.4 Suppose that M is a nonempty subset of the real Banach space E, 
Xo an interior point of M, and f a real valued function on M. Then the Gateaux 
derivative 5,, f of f at xo exists if, and only if, the first variation A f (xo, h) exists 
forallh € Eandht> Af (xo,h) is a continuous linear functional on E. 

In this case, one has A f (x9, h) = Sxo f. 


Proof A straightforward inspection of the respective definitions easily proves this 
lemma. 


534 34 Differential Calculus on Banach Spaces and Extrema of Functions 


34.5 Exercises 


—_ 


. Complete the proof of Lemma 34.1. 
2. Let E and F be two real normed spaces and A : E — F a continuous linear 
operator such that Ax = o(x) for all x € E, |x| < 1. Prove: A = 0. 

3. Fora function f : U — R”, U C R’ open, assume that it is differentiable at a 
point x9 € U. Use Definition 34.1 to determine the Fréchet derivative f’(xo) of 
f at xo and relate it to the Jabobi matrix af (x9) of f at xo. 

. Prove part (a) of Proposition 34.1. 

5. Prove that o(h) = g’/(y)(0\(h)) + 02( f’(x)(A) + 0;(A)) is a higher order term, 

under the assumptions of Proposition 34.1, part (b). 

6. Let I = [a,b] be some finite closed interval. Equip the space E = C'(J, R) of 

all continuously differentiable real valued functions (one-sided derivatives at the 

end points of the interval) with the norm 


a 


lll. = sup {lw?@|:t eT, 7 =0,1}. 


Under this norm E = C!(J/,R) is a Banach space. For a given continuously 
differentiable function F : J x R x R > R, define a function f : E — R by 


b 
fue / F(t,u(t),w(t)) dt 


a 


and show that f is Fréchet differentiable on E. Show in particular 


: d 
f' WO) = ; [ F(t, u(t), u'(t)) — qy Fut uO), u'(t))|v(t) dt 


a 


+ Fu(t,u@),uwoOwo 


for allv € E. F,, denotes the derivative of F with respect to the second argument 
and similarly, F',, denotes the partial derivative with respect to the third argument. 
Now consider M = {u € E: u(a) =c, u(b) = d} for some given values c,d € 
R and show that the derivative of the restriction of f to M is 


b d 
f' (Wy) = [Fu(t, w(t), w'(t)) — F(t, u(t), we) |v) dt (34.21) 


a 


for all v € E, v(a) = 0 = v(b). Deduce the Euler-Lagrange equation 
/ d / 
F(t, u(t), u (t)) — au u(t),u(t)) = 0. (34.22) 


Hint: Use the Taylor expansion with remainder for F and the arguments for the 
proof of Theorem 3.2. 

7. Suppose that E, F are two real Banach spaces. Prove the existence of the natural 
isomorphism £L(E, £L(E, F)) = B(E x E, F). 


Reference 535 


Hint: For h € L(E,L(E, F)) define Hath € BCE x E,F) by h(e), e) = 
h(e,)(e2) for all e}.e2 € E and for b € B(E x E,F) let us define be 
L(E, LCE, F)) by ble )(e2) = b(e;, e2) and then show that these mappings are in- 
verse to each other. Write the definition of the norms of the spaces L(E, L(E, F)) 
and B(E x E, F) explicitly and show that the mappings h t> handb + bboth 
have anorm < 1. 


. Prove the existence of the natural isomorphism (34.9) for n = 2,3,4,.... 
. Prove the chain rule for the Gateaux derivative. 
. Complete the proof of part (d) of Lemma 34.3. 


11. Let V be a function R*? — R of class C!. Find the Euler-Lagrange equation 
(34.22) explicitly for the functional J(u) = l CxO) — V(u(t))) dt on dif- 
ferentiable functions u : [a,b] > R°, u(a) = x, u(b) = y for given points 
x,ye R?. 

12. Consider the function g(s) = Wee tees, s € R\ {0}, and show g(s) > 1 as 
|s| + 00, g(s) > 0 as s — 0. Conclude that there are constants 0 <c <C < 
oo such that cls|? < g(s)|s|? < Cls|? for alls € R. 

Reference 


1. 


Dewitt-Morette C, Dillard-Bleick M, Choquet-Bruhat Y. Analysis, manifolds and physics. 
Amsterdam: North-Holland; 1982. 


Chapter 35 
Constrained Minimization Problems (Method 
of Lagrange Multipliers) 


In the calculus of variations we have often to do with the following problem: Given 
a real valued function f on a nonempty open subset U of a real Banach space 
E, find the minimum (maximum) of f on all those points x in U which satisfy a 
certain restriction or constraint. A very important example of such a constraint is 
that the points have to belong to a level surface of some function g, i.e., have to 
satisfy g(x) = c where the constant c distinguishes the various level surfaces of the 
function g. In elementary situations, and typically also in Lagrangian mechanics, 
one introduces a so-called Lagrange multiplier X as a new variable and proceeds to 
minimize the function f(-) + A(g(-) — c) on the set U. In simple problems (typically 
finite dimensional) this strategy is successful. The problem is the existence of a 
Lagrange multiplier. 

As numerous successful applications have shown the following setting is an 
appropriate framework for such constrained minimization problems: 


Let E, F be two real Banach spaces, U C E an open nonempty subset, g : U > Fa 
mapping of class C!, f : U > Ra function of class C!, and yo some point in F. The 
optimization problem for the function f under the constraint g(x) = yo is the problem of 
finding extremal points of the function f\y : M — R where M = [g = yo] is the level 
surface of g through the point yo. 


In this chapter we present a comprehensive solution for the infinite dimensional 
case, mainly based on ideas of Ljusternik [1]. A first section explains in a simple 
setting the geometrical interpretation of the existence of a Lagrange multiplier. As 
an important preparation for the main results the existence of tangent spaces to level 
surfaces of C!-functions is shown in substantial generality. Finally the existence of 
a Lagrange multiplier is proven and some simple applications are discussed. 

In the following chapter, after the necessary preparations, we will use the results 
on the existence of a Lagrange multiplier to solve eigenvalue problems, for linear 
and nonlinear partial differential operators. 


© Springer International Publishing Switzerland 2015 537 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_35 


538 35 Constrained Minimization Problems (Method of Lagrange Multipliers) 


=d Lf = dy] 
[f= di) pee 
[g =<] Ig =<] Wie 
Fig. 35.1 Level surface [g = c] and[f = dj], i = 0,1,2; di < dy < dy; i = 1 two points of 
intersection, i = 0 touching level surfaces; i = 2 no intersection 


35.1 Geometrical Interpretation of Constrained Minimization 


In order to develop some intuition about constrained minimization problems and 
the rdle of the Lagrange multiplier we consider such a problem first on a space 
of dimension two and discuss heuristically in geometrical terms how to obtain the 
solution. Let U C R* be a nonempty open subset. Our goal is to determine the 
minimum of a continuous function f : U — R under the constraint g(x) = c 
where the constraint function g : U — R is continuous. This means: Find x9 € U 
satisfying g(xo) = c and f(xo) < f(x) for all x € U such that g(x) = c. In 
this generality the problem does not have a solution. If however both f and g are 
continuously differentiable on U, then the level surfaces of both functions have well- 
defined tangents, and then we expect a solution to exist, because of the following 
heuristic considerations. 
Introduce the level surface 


Ig =c]={x €U: g(x) =c} 


and similarly the family of level surfaces [f= d], d € R, for the function f. If a 
level surface [ f=d] does not intersect the level surface [g =c], then no point on this 
level surface of f satisfies the constraint and is thus not relevant for our problem. If 
for a certain value of d the level surfaces [ f=d] and [g=c] intersect in exactly one 
point (at some finite angle), then for all values d’ close to d the level surfaces [g = c] 
and [f = d’] also intersect at exactly one point, and thus d is not the minimum 
of f under the constraint g(x) = c. Next consider a value of d for which the level 
surfaces [g = c] and[f = d] intersect in at least two distinct points (at finite angles). 
Again for all values d’ sufficiently close to d the level surfaces [,f = d'] and [g = c] 
intersect in at least two distinct points and therefore d is not the minimum of f 
under the given constraint. Finally consider a value do for which the level surfaces 
[g = c] and[f = do] “touch” in exactly one point xo, ie., [g=c] NL f = do] = {xo} 
and the tangents to both level surfaces at this point coincide. In this situation small 
changes of the value of d lead to an intersection which is either empty or consists 
of at least two points, hence these values d’ 4 dp do not produce a minimum under 
the constraint g(x) = c. We conclude that dp is the minimum value of f under the 
given constraint and that xo is the minimizing point. The following figure shows in 
a two dimensional problem three of the cases discussed above (Fig. 35.1). Given the 
level surface [g =c] of the constraint function g, three different level surfaces of the 
function f are considered. 


35.2 Tangent Spaces of Level Surfaces 539 


Consider the level surfaces [g = c] and[f = d] of smooth functions g, f over an 
open set U C R?. Assume (or prove under appropriate assumptions with the help of 
the implicit function theorem) that in a neighborhood of the point x9 = (x ‘e a ) these 
level surfaces have the explicit representation x2 = y(x,), respectively x2 = &(x}). 
Under these assumptions it is shown in the Exercises that the tangent to these touching 
level surfaces coincide if, and only if, for some 2 € R = £(R, R) 


(Df)(Xo) = (Dg)(X0) - (35.1) 


35.2 Tangent Spaces of Level Surfaces 


In our setting a constraint minimization problem is a problem of analysis on level 
surfaces of C! mappings. It requires that we can do differential calculus on these 
surfaces which in turn relies on the condition that these level surfaces are differen- 
tial manifolds. The following approach does not assume this but works under the 
hypothesis that one has, at the points of interest on these level surfaces, the essential 
element of a differential manifold, namely a proper tangent space. 

Recall that in infinite dimensional Banach spaces E a closed subspace K does 
not always have a topological complement, i.e., a closed subspace L such that EF is 
the direct sum of these two subspaces (see for instance [2]). Thus in our fundamental 
result on the existence of a proper tangent space this property is assumed but later 
we will show when and how it holds. 


Theorem 35.1 (Existence of a Tangent Space) Let E, F be real Banach spaces, 
U © E anonempty open subset, and g : U + F amapping of class C'. Suppose 
that xq is a point of the level surface [g = yo] of the mapping g. If xo is a regular 
point of g at which the null-space N(g'(xo)) of the derivative of g has a topological 
complement in E, then the set 


Tolg = yo] = {x € E : du € N(g'(x0)), x = x0 + u} = x0 + N(g'(ro)) (35.2) 


is a proper tangent space of the level surface [g = yo] at the point xo, i.e., there is a 
homeomorphism x of a neighborhood U' of xo in T,[g = yo] onto a neighborhood 
V of Xo ing = yo] with the following properties: 


a) x(xo tu) = x9 tu+ Gu) for all xo +ue U'; 
b) is continuous and of higher than linear order in u, p(u) = oh). 


Proof Since x is aregular point of g, the derivative g’(x) is a surjective continuous 
linear mapping from E onto F’.. By assumption the null-space K = N(g'(xo)) of the 
mapping has a topological complement L in E so that the Banach space E is the 
direct sum of these two closed subspaces, E = K + L. It follows (see [2]) that there 
are continuous linear mappings p and qg of E onto K and L, respectively, which have 
the following properties: K = ran p = N(q), L = N(p) = ranq, p* = p,q? = 4, 
pt+q=id. 


540 35 Constrained Minimization Problems (Method of Lagrange Multipliers) 


Since U is open there is r > O such that the open ball B, in E with center 0 and 
radius r satisfies x9 + B,+ B, C U. Now define a mapping w : KNB, x LNB, > F 
by 


Wu, v) = g(x +u+v) Vue KOB,, VveELneB,. (35.3) 


By the choice of the radius r this map is well defined. The chain rule implies that it 
has the following properties: w (0,0) = g(xo) = yo, w is continuously differentiable 
and 


Wu(0,0) = g'(xo)ix =OE L(K,F), Wy(0,0 = g'(xo)z € LL, F). 


On the complement L of its null-space the surjective mapping g/(xp) : E > F 
is bijective, thus y,,(0,0) is a bijective continuous linear mapping of the Banach 
space L onto the Banach space F. The inverse mapping theorem (see Appendix 
34.5) implies that the inverse y,,(0, 0)-! : F — L is a continuous linear operator 
too. Thus all hypotheses of the implicit function theorem (see, for example, [3]) are 
satisfied for the problem 

W(u,v) = yo. 


This theorem implies that there is 0 < 6 < r anda unique function g : KN Bs > L 
which is continuously differentiable such that 


yo=Vu,gtu)) Vue KNB; and g(0)=0. 


Since in general g’(0) = —y,(0,0)~! w,,(0,0) we have here g’(0) = 0 and thus 
pu) = o(u). 

Define a mapping x : x» + KN Bs — M by x(x +4) = x9 +u+ glu). Clearly 
X is continuous. By construction, yo = W(u, y(u)) = g(x + ut g(u)), hence x 
maps into M = [g = yo]. By construction, u and g(u) belong to complementary 
subspaces of E,, therefore x is injective and thus invertible on 


V={xo tut+Qgtu):ue KNB} CM. 


Its inverse is y~!(xp + u + Y(u)) = xo + u. Since ran p = K and N(p) = L the 
inverse can be represented as 


x (xo + ut G(u)) = xo + plu + gu) 


and this shows that y~! is continuous too. Therefore x is a homeomorphism from 


U' = x9 + K 1 B; onto V C M. This concludes the proof. 

Apart from the natural assumption about the regularity of the point x9 this theorem 
uses the technical assumption that the nullspace K = N(g’(xo)) of g'(xo) € L(E, F) 
has a topological complement in E. We show now that this assumption is quite 
adequate for the general setting by proving that it is automatically satisfied for three 
large and frequent classes of special cases. 


35.3 Existence of Lagrange Multipliers 541 


Proposition 35.1 Let E, F be real Banach spaces and A: E — F a surjective 
continuous linear operator. The nullspace K = N(A) has a topological complement 
in E, in the following three cases: 


a) E is a Hilbert space; 

b) F is a finite dimensional Banach space; 

c) N(A) is finite dimensional, for instance A : E — F is a Fredholm operator 
(i.e., an operator with finite dimensional null-space and closed range of finite 
codimension). 


Proof If K is a closed subspace of the Hilbert space EF, the projection theorem 
guarantees existence of the topological complement L = K+ and thus proves Part 
a). 

If F is a finite dimensional Banach space, there exist linearly independent vectors 
€1,---@m € E such that {f; = Ae),..., fin = Aem} is a basis of F'. The vectors 
€1,---,€m generate a linear subspace V of FE of dimension m and it follows that 
A now is represented by Ax = iat aj;(x)f; with continuous linear functionals 
aj: E > R. Define px = )7'_, aj(x)e; and qx = x — px. One proves easily that 
p> = p,q? =4q,p+4q= id, V = pE and that both maps are continuous. Thus 
V = pE is the topological complement of N(A) = qE. This proves b). 

Suppose {e1,...,@n} 1s a basis of N(A). There are continuous linear functionals 
a; on E such that a;(e;) = 4;; fori, j = 1,...,m. (Use the Hahn—Banach theorem). 
As above define px = 7", aj(x)ej and gx = x — px for all x ¢ E. Now we 
conclude as in Part b). (See the Exercises) 


Corollary 35.1 Suppose that E, F are real Banach spaces, U C E a nonempty 
open set and g : U > F amap of class C!. In each of the three cases mentioned in 
Proposition 35.1 for A = g'(xo) the tangent space of the level surface [g = yo] at 
every regular point xo € [g = yo] of g is given by Eq. (35.2). 


Proof Proposition 35.1 ensures the hypotheses of Theorem 35.1. 


35.3. Existence of Lagrange Multipliers 


The results on the existence of the tangent spaces of level surfaces allow us to translate 
the heuristic considerations on the existence of a Lagrange multiplier into precise 
statements. The result which we present now is primarily useful for the explicit 
calculation of the extremal points once their existence has been established, say as a 
consequence of the direct methods discussed earlier. 


Theorem 35.2 (Existence of Lagrange Multipliers) Let E,F be real Banach 
spaces, U C E open and nonempty, g : U > F and f : U > R of class C!. 
Suppose that f has a local extremum at the point xy € U subject to the constraint 
g(x) = yo = g(x). If Xo is a regular point of the map g and if the null-space 


542 35 Constrained Minimization Problems (Method of Lagrange Multipliers) 


K = N(g'(x0)) of g'(X0) has a topological complement L in E, then there exists a 
continuous linear functional € : F — R such that xo is a critical point of the function 
F=f —-—tlog:U-R, thatis 


f' (0) = £0 g(x). (35.4) 


Proof The restriction H of g’(xo) to the topological complement L of its kernel K is 
a continuous injective linear map from the Banach space L onto the Banach space F 
since xo is aregular point of g. The inverse mapping theorem (see Appendix) implies 
that H has an inverse H~! which is a continuous linear operator F > L. 

According to Theorem 35.1 the level surface [g = yo] has a proper tangent space 
at xo. Thus the points x of this level surface, in a neighborhood V of xo, are given by 
x =X) t+u+g(u), u € K MB; where 6 > 0 is chosen as in the proof of Theorem 
35.1. Suppose that f has a local minimum at xo (otherwise consider — f). Then there 
is anr € (0,5) such that f(xo) < f(xo + u+ g(u)) for all u € KM B,, hence by 
Taylor’s theorem 


0< fo) + f’Aod(~W) +ou+ eu) Vue KNB,. 


Since we know that g(u) = o(u), this implies f’(xo)(u) = 0 for all u € KM B,. But 
u € K 1 B, is absorbing in K, therefore f’(xo)(u) = 0 for allu € K, ie., 


K = N(g'(x0)) & N(f'(x0)). (35.5) 


By assumption, FE is the direct sum of the closed subspaces K,L, E = K + L. 
Denote the canonical projections onto K and L by p respectively q. If x1,x. € E 
satisfy g(x1) = q(x2), then x; — x2 € K and thus Eq. (35.5) implies f’(x9)(41) = 
S'(xo)(%2). Therefore a continuous linear functional 7 (xo) : L — R is well defined 
by f '(xo)(qx) = f'(xo)(x) for all x € E. This functional is used to define 


l= f(x)oH !:FOR 


as a continuous linear functional on the Banach space F which satisfies Eq. (35.4), 
since for every x € E 


£0 g'(xo(x) = £0 g'(xo (qx) = £0 H(qx) = f'xo)(qx) = f’Co)(2). 


We conclude that xp is a critical point of the function F = f — £0 g, by using the 
chain rule. 

To illustrate some of the strengths of this theorem we consider a simple example. 
Suppose F is areal Hilbert space with inner product (-, -) and A a bounded self-adjoint 
operator on EF. The problem is to minimize the function f(x) = (x, Ax) under the 
constraint g(x) = (x,x) = 1. Obviously both functions are of class C!. Their 
derivatives are given by f’(x)(u) = 2(Ax, u), respectively by g’(x)(u) = 2(x, u) for 
all u € E. It follows that all points of the level surface [g = 1] are regular points 
of g. Corollary 35.1 implies that Theorem 35.2 can be used to infer the existence of 


35.3 Existence of Lagrange Multipliers 543 


a Lagrange multiplier A € R if xp is a minimizing point of f under the constraint 
g(x) = 1: f’(x%o) = Ag’(xo) or Axo = Axo, ie., the Lagrange multiplier 4 is 
an eigenvalue of the operator A and xo is the corresponding normalized eigenvector. 
This simple example suggests a strategy to determine eigenvalues of operators. Later 
we will explain this powerful strategy in some detail, not only for linear operators. 

In the case of finite dimensional Banach spaces we know that the technical assump- 
tions of Theorem 35.2 are naturally satisfied. In this theorem assume that E = R” 
and F = R”. Every continuously linear functional @ on R” is characterized uniquely 
by some m-tuple (A1,..., Am) of real numbers. Explicitly Theorem 35.2 takes now 
the form 


Corollary 35.2 Suppose that U C R" is open and nonempty, and consider two 
mappings f : U > Rand g : U — R" of class C'. Furthermore assume that the 
function f attains a local extremum at a regular point xy € U of the mapping g (i.e., 
the Jacobi matrix g'(xo) has maximal rank m) under the constraint g(x) = yo € R”. 


Then there exist real numbers 21, ..., Am such that 
of m dg; ; 
sae re, i=l,...,n. (35.6) 


Note that Eq. (35.6) of Corollary 35.2 and the equation g(xo) = yo € R” give us 
exactly n + m equations to determine the n + m unknowns (A, x9) € R” x U. 

Theorem 35.2 can also be used to derive necessary and sufficient conditions for 
extremal points under constraints. For more details we have to refer to Chap. 4 of 
the book [4]. 


35.3.1 Comments on Dido’s Problem 


According to the brief discussion in the introduction to Part C Dido’s original prob- 
lem is a paradigmatic example of constrained minimization. Though intuitively the 
solution is clear (a circle where the radius is determined by the given length) a rigor- 
ous proof is not very simple even with the help of the abstract results which we have 
developed in this section. Naturally Dido’s problem and its solution have been dis- 
cussed much in the history of the calculus of variations (see [5]). Weierstrass solved 
this problem in his lectures in 1872 and 1879. There is also an elegant geometrical 
solution based on symmetry considerations due to Steiner. 

In the Exercises we invite the reader to find the solution by two different methods. 
The first method suggests parametrizing the curve we are looking for by its arc length 
and using Parseval’s relation in the Hilbert space H = L?({0,27]). This means 
that we assume that this curve is given in parametric form by a parametrization 
(x(t), y(t)) € R?,0 < +t < 2m where x, y are differentiable functions satisfying 
x(t)? + y(t)? = 1 for all t € [0, 277]. With this normalization and parametrization the 
total length of the curve is L = a x(t)? + y(t)? dt = 27 and the area enclosed 


544 35 Constrained Minimization Problems (Method of Lagrange Multipliers) 
by this curve is 


Qn 
a= x(t)y(t) dt. 
0 


Proposition 35.2 For all parametrizations of the form described above one has 
A <a. Furthermore, A = 1 if, and only if, the curve is a circle of radius |. 


Proof See the Exercises. 

The second approach uses the Lagrange multiplier method as explained above. 
Suppose that the curve is to have the total length 2L9. Choose a parameter a such 
that 2a < Lo. Ina suitable coordinate system the curve we are looking for is given 
as y = u(x), —a < x <a, and u(x) => 0, u(+a) = 0 with a function u of class c. 
Its length is tae 1 + u'(x)* dx = L(w) and the area enclosed by the x-axis and this 
curve is A(u) = Chis u(x) dx. The problem then is to determine u such that A(u) is 
maximal under the constraint L(u) = Lo. 


Proposition 35.3 For the constrained minimization problem for A(u) under the 
constraint L(u) = Lo there is a Lagrange multiplier x satisfying Fat = { for 


some s € Randa solution u(x) = AL /I Gy J! ($¢)7], -a < x <a. One 


has Lo = 220(a) with 0(a) = arcsin $ € [0, 5]. For this curve the area is 


A(u) = 476(a) — aV22 = a?. 


Proof See the Exercises. 

Since Lo = 2) @(a) the Lagrange multiplier 1 is a function of a and hence one 
can consider A(u) as a function of a. Now it is not difficult to determine a so that the 
enclosed area A(u) is maximal. For a = A = be this area is maximal and is given 
by A(u) = a?x/2. This is the area enclosed by a half-circle of radius a = fo 
Remark 35.1 There is an interesting variation of Dido’s problem which has found 
important applications in modern probability theory (see [6]) and which we mention 
briefly. Let A C R” be a bounded domain with a sufficiently smooth boundary and 
for t > 0 consider the set 


A; = {x € R"\A: |x —yll <t, Vy € A}. 


Now minimize the volume |A,| of the set A; under the constraint that the volume |A| 
of A is fixed. The answer is known: This minimum is attained when A is a ball in 
R". This is of particular interest in the case of very high dimensions n — oo since 
then it is known that practically the volume of A, U A is equal to the volume of A;. 
For the proof of this result we refer to the book [7] and the article [8]. 


35.4 Exercises 545 


35.4 Exercises 


1. 


WwW 


Let U C R? be open and nonempty. Suppose f, g € C!(U,R) have level surfaces 
[g=c] and [f =d] which touch ina point x) € U in which the functions f, g have 
nonvanishing derivatives with respect to the second argument. Prove Eq. 35.1. 


. Prove in detail: A finite dimensional subspace V of a Banach space FE has a 


topological complement. 


. Prove Corollary 35.2. 
. Prove Proposition 35.2. 


Hints: Use the Fourier expansion for x, y: 


CO 
x(t) = > + pe (az cos kt + by sin kt) 
a oe) 
y(t) = z +S > (on cos kt + By sin kt). 


k=1 
Calculate x(t), $(t) and calculate <a [x(t)? + 3(t)*] dt as 
(X(t), X(2))2 + (9), W(O))2 

using (cos kt, sin jt)z = Oand (cos kt, cos kt)z = (sinkt, sinkt)z = 2. Similarly 
one can calculate A = (x,y). = 7 am k (ap Pe — beaz). This gives 

CO CO 
Qn -—2A =m) (kK —blag+by+ag+ Belt > — kay — Bey + (Oe + by) 1. 

k=1 k=1 


Now it is straightforward to conclude. 


. Prove Proposition 35.3. 


Hints: (1) Calculate the Fréchet derivative of the constraint functional L(u) and 
show that all points of a level surface [L = Lo] are regular points of the mapping 
L, for 2a < Lo. (2) Prove that |u(x)| < L(u) for all x € [—a,a] and hence 
A(u) < 2aL(u) = 2aLpo. (3) Prove that A(u) is (upper semi-) continuous for 
the weak topology on E = Hj(—a,a). (4) Conclude that a maximizing element 
u € E and a Lagrange multiplier 4 exist. (5) Solve the differential equation 
A’(u) = AL'(u) under the boundary condition u(—a) = u(a) = 0. (6) Calculate 
L(y) for this solution and equate the result to Lo. (7) Calculate the area A(u) for 
this solution. 


6. Another solution method for Dido’s problem: Suppose u(t) = (x(t), y(t)), 0 < 


t < a is the parametrization of a closed differentiable curve in the plane. 

a) Show that the length of the curve is given by €(u) = de x(t)? + p(t)? dt 
and the enclosed area by A(u) = ; ie (x(t)y(t) — y(t)x(t)) dt. Then Dido’s 
problem reads: Maximize A(u) under the constraint that the length @(u) = L 
has a given value. 


546 35 Constrained Minimization Problems (Method of Lagrange Multipliers) 


b) Introduce the Langrange function L(u, i) = 5 (xy — yx) + Ad/x(t)? + p(t)? 
with a Langrange multiplier A to be determined later, corresponding to the 
functional [(u) = A(u) + A€(u). Show that the Euler-Lagrange equations 


d aL aL . : 
on an for this functional are 
d 1 a AX 1, 
y =i ey 
dt \ 2° Vin? + HP 2 
d /1 Ay 1 


dt \2" * Freese 


c) Integrate this system and show that the solutions are circles of radius 4 and 
center (xo, yo) as integration constants. Then the Lagrange parameter A is 
determined by the condition that the curve has to have a given length €(u) = L. 


References 


oe 


Ljusternik LA. On conditional extrema of functions. Mat Sbornik. 1934;41(3):390-401. 


. Robertson AP, Robertson WJ. Topological vector spaces. Cambridge: Cambridge University 


Press; 1973. 

Dieudonné JA. Foundations of modern analysis. New York: Academic; 1969. 

Blanchard Ph, Briining E. Variational methods in mathematical physics. A unified approach. 
Texts and monographs in physics. Berlin: Springer-Verlag; 1992. 

Goldstine HH. A history of the calculus of variations from the 17th through the 19th century. 
Studies in the history of mathematics and physical sciences. vol. 5. New York: Springer-Verlag; 
1980. 

Ledoux M, Talagrand M. Probability in Banach spaces: isoperimetry and processes. Ergebnisse 
der Mathematik und ihrer Grenzgebiete ; Folge 3. vol. 23. Berlin: Springer-Verlag; 1991. 
Burago YD, Zalgaller VA. Geometric inequalities. Die Grundlehren der mathematischen 
Wissenschaften in Einzeldarstellungen. vol. 285. Berlin: Springer-Verlag; 1988. 


. Osserman R. The isoperimetric inequality. Bull Am Math Soc. 1978;84:1182-1238. 


Chapter 36 
Boundary and Eigenvalue Problems 


One of the first areas in which variational concepts and methods have been applied 
were linear boundary and eigenvalue problems. They can typically be solved in 
concrete Hilbert spaces of functions. These and related problems will be the topic of 
this chapter. Before we turn to these concrete problems, we discuss several abstract 
minimization problems in Hilbert spaces, some of them have already been mentioned 
in Part II on Hilbert spaces. 

In order to prepare for the solution of linear boundary and eigenvalue problems, the 
connection of linear partial differential operators and quadratic forms is established 
in Sect. 36.2. Since on a general level minimization of quadratic forms has been 
well discussed, it is fairly easy then to solve these concrete boundary and eigenvalue 
problems. 


36.1 Minimization in Hilbert Spaces 


According to our outline of the general strategy of the direct methods in the calculus 
of variations, Hilbert spaces are well suited for problems of minimization since they 
are spaces in which bounded sets are relatively compact for the weak topology. 
Their additional geometric structure is often very helpful too. This advantage will be 
evident from the following 


Theorem 36.1 (Projection Theorem for Convex Sets) Suppose that K is a convex 
closed subset of a Hilbert space H. Then, for any x € H, there is a unique u € K 
which satisfies 


\|x — ul] = inf ||x — vl] = d(x, K). (36.1) 
veK 


The element u € K is called the projection of x onto K: u = proj «x. It is 
characterized by the inequality 


(x —u,v—u) <0 VveK. 
© Springer International Publishing Switzerland 2015 547 


P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_36 


548 36 Boundary and Eigenvalue Problems 
The mapping proj x : # — K defined in this way is continuous, and one has 


I|[proj cx — projxyll < llx—yll Vx,y €H. 


Proof For fixed x € H consider the function y, : K — R defined by ¥,(z) = 
|x — z|| for all z € K. It is certainly continuous and as a small calculation shows 
(see Exercises), strictly convex. Lemma 33.3 implies that yy, is weakly lower semi- 
continuous on the closed convex and thus weakly closed convex set. Therefore, in 
the case that K is bounded, Theorem 33.3 applies. If K is not bounded, then certainly 
Wy is coercive, i.e., W.(Z) > 00 as ||z|| — oo, and thus Theorem 33.4 applies. In 
both cases, we conclude that there is a point uw. € K which minimizes y, on K, 


Wx(Ux) = ||x — uy|| = inf ||x — Z|]. 
zeK 


The minimizing point is unique since yw, is strictly convex (Theorem 33.2). These 
arguments apply to any x € H. Because x +> u, is one-to-one we get a well-defined 
map px :H — K by 


ux, x € H\K, 
x, xeEK,. 


PR(X) = 


In order to prove the characteristic inequality for u = u, € K take any z € K. Since 
K is convex tz + (1 — t)u € K and we know ||x — u|| < ||x —tz-— (1 —1)u|| or 


O< |x —w+t(u— 2)? = lle — ull? = 2t(x — uu — z) +07 lu all? 


for all 0 < t < 1. A standard argument implies (x — u,u — z) > 0. 
Conversely assume that u € K is a point for which this inequality holds for all 
z € K. Then on the basis of this inequality we estimate as follows: 
|x — ull? = (x —u,x —u) = (x —u,x —2) + (x —u,x —u— (x —2) 
= (x —u,x — 2) + (x —u,z—Uu) < (x —u,x — 2) 
+ 


= (x —z,x —z) + (z—u,x — 2Z) 


<2 =2|? Hz -ae aye —,2— 0) = |x — 2/7, 


hence ||x — u|| < ||x —z|| for all z © K and thus ||x — u|| = infzex ||x — z||. We 
conclude that the minimizing element u, = u is indeed characterized by the above 
variational inequality. 

Finally we prove Lipschitz continuity of the mapping px. Given x, y € H denote 
u = px(x) and v = px(y). Now apply the variational inequality first for u and 
z=v€K and then for v andz =u € K. This gives the two inequalities 


(x —u,v—u) <0, (y—v,u—v) <0 


36.1 Minimization in Hilbert Spaces 549 


which we add to get 
(x -—y+tv—u,v—u) <0 


or 
2 
lv — ull” < (y—x,v—u) < lly—-x]] llv—all, 


and the estimate ||v — u|| < ||y — x|| follows, which is just the continuity estimate 
and we conclude. 

In Part II the spectral theory for compact operators (Riesz—Schauder theory, The- 
orem 25.5) has been developed by using mainly Hilbert space intrinsic arguments. 
We discuss this proof now as an application of the direct methods. 

Let H be a separable Hilbert space and A # 0 a self-adjoint compact operator on 
H. Denote by By = {x € H: ||x|| < 1} the closed unit ball of H and by S; the unit 
sphere of this space. Recall that the norm of the operator can be expressed as 


|| Al] = sup || Au|] = sup |(u, Av)|. 


ueB, ues} 


Thus, the calculation of the norm can be regarded as a problem of finding a 
maximum of the function u +> ||Au|| on the closed unit ball B, or of the function 
u +> |(u, Au)| on the unit sphere S|. Since A is compact, it maps weakly conver- 
gent sequences into norm convergent sequences and thus both functions are weakly 
continuous. Since the closed unit ball B, is weakly compact we can apply Theorem 
33.3 and thus get a point e; € B, such that 


|| All = || Aei|] = sup ||Aul| . 
ue B, 


Since A 4 0 we know e; ¥ 0 and thus |le;|| = 1 as a simple scaling argument 
shows. 

Consider the function f(u) = (u, Au) on H. It is Fréchet differentiable with 
derivative f’(u)(x) = 2(Au, x) for all x € H. As we have shown above, this function 
has a maximum on S; given by 


sup f(u) = + |/Al] = £f(e1). 


ueS| 


The unit sphere Sj is the level surface [g = 1] of the constraint function g(x) = (x, x) 
on H which is Fréchet differentiable too, and its derivative is g’(u)(x) = 2(u, x) for 
all x € H. Therefore, all points u € S, are regular points of g. The results on 
the existence of a Lagrange multiplier (Theorem 35.2 and Corollary 35.1) apply, 
i.e., there is an A; € R such that f’(e;) = 1 g'(e1) or Ae; = Aye}. It follows that 
lAi| = |All. 

Then the proof is completed as it has been shown in Part B, Sect. 25.1. This 
ends our remarks on the proof of the spectral theorem for compact operators as an 
application of variational methods. 

Theorem 25.5 establishes the existence and some of the properties of eigenvalues 
of acompact self-adjoint operator. The above comments on the proof hint at a method 


550 36 Boundary and Eigenvalue Problems 


for calculating these eigenvalues. And indeed this method has been worked out in 
full detail and leads to the classical minimax principle of Courant—Weyl—Fischer— 
Poincaré—Rayleigh-Ritz. 


Theorem 36.2 (Minimax Principle) Let H. be a real separable Hilbert space and 
A > 0a self-adjoint operator on H with spectrum o(A) = {Am :m € N} ordered 
according to size, Am < Amy. Form = 1,2,... denote by E,, the family of all 
m-dimensional subspaces E,, of H. Then the eigenvalue ,, can be calculated as 


: (v, Av) 
hm = Min max 
Em€Em VEEm (y, v) 


(36.2) 
Proof The proof is obtained by determining the lower bound for the values of the 
Rayleigh quotient R(v) = se . In order to do this we expand every v € H in terms 
of eigenvectors e; of A. This gives v = )°°, aje; and (v,v) = )°~°, a?. In this form 
the Rayleigh quotient reads 


Diet AiG? 

ay 
Denote by V,, the linear subspace generated by the first m eigenvectors of A. It 
follows that 


RV) = 


@ 
maxR(v)= max SE"! =), = Rlen), 
veVin ( ) (d1,.-. dm )ER™ pe a? m ( m) 

and thus we are left with showing maxyez,,R(v) = Am for every other subspace 

Em € En. Let Em 4 Vm be such a subspace; then Ey, M ve + {0} and therefore 


maxR(v) > max R(v). 
ve Em ve En, AVL 


m 


Every v € En, M V+ is of the form vy = Sid a;,e; and for such vectors we have 


m 
ae 
eer hia 


Riv) = : >HX > Am. 
y — Am+1 = Am 
ere, G; 


This then completes the proof. 
Theorem 36.2 implies for the smallest eigenvalue of the operator A the simple 
formula 


. (v, Av) 
A; = min 


veE, vA0 (y, v) : Gem 


36.2 The Dirichlet—Laplace Operator and Other Elliptic Differential Operators 551 


36.2 The Dirichlet-Laplace Operator and Other Elliptic 
Differential Operators 


The goal of this section is to illustrate the application of the general strategy and the 
results developed thus far. This is done by solving several relatively simple linear 
boundary and eigenvalue problems. The typical example is the Laplace operator with 
Dirichlet boundary conditions on a bounded domain §2. Naturally, for these concrete 
problems we have to use concrete function spaces, and we need to know a number 
of basic facts about them. In this brief introduction, we have to refer the reader to 
the literature for the proof of these facts. We recommend the books [1-3]. 

For a bounded domain §2 C R” with smooth boundary 02 consider the real 
Hilbert space L?(2) with inner product (-, -)2. Recall the definition of the Sobolev 
space H'(&2) (see Chap. 13) as 


MO)= [we LO) ee), f= 1hc4a}, (36.4) 


Here naturally the partial derivatives 0;u are understood in the weak (distributional) 
sense. One shows that H'() is a Hilbert space with the inner product 


(u,v) = (u,v)2 + (Du,Dv)> = Vu,v € H'(2) (36.5) 


where Du = (0,u,..., 0,4) and where in the second term the natural inner product of 
L?(Q)*" is used. This space is the Sobolev space W!(Q). Next define a subspace 
of this space: 


Hj (2) = closure of D(2) in H'(2). (36.6) 


Intuitively, Hy (2) is the subspace of those u € H!() whose restriction to the 
boundary 0£2 vanishes, ujag = 0. 

The Sobolev space H '(2) is by definition contained in the Hilbert space L?(Q), 
however, the following compact embeddings for 2 < n are of greater importance to 
us, 


H\(Q)6 LQ), 1<q<2%= , 2<n (36.7) 
and 
H\(Q)3 LQ), 1<q<ow, 2=n. (36.8) 


This means that every weakly convergent sequence in H'() converges strongly in 
L4(Q). In addition we are going to use the important Sobolev inequality 
1/p 


lula < S[Dull, = S| >> lajull? Vue H'(2), (36.9) 


j=l 


552 36 Boundary and Eigenvalue Problems 


where S is the Sobolev constant depending on g,n and where q is in the range 
indicated in (36.7), respectively (36.8). 

Now we are in the position to show that the famous Dirichlet problem has a 
solution. 


Theorem 36.3 (Dirichlet Problem) Let 2 Cc R" be abounded open set with smooth 
boundary and vy € H'() some given element. Then the Dirichlet integral 


foy= | JDvixyPax = f Y > lajv@)P dx (36.10) 
Q 2 Fay 


is minimized on M = vo + Hi (82) by an element v € M satisfying 


Av=0 inQQ and Via2 = Vojan- (36.11) 


Proof Observe that f(u) = Q(u,u) with the quadratic functional Q(u,v) = 
(Du, Dv)2. This quadratic form satisfies, because of inequality (36.9), the estimate 


2, 2 
¢llull’ < Q@,u) < lull 


for some c > 0. It follows that Q is a strictly positive continuous quadratic form 
on H'!() and thus f is a strictly convex continuous function on this space (see the 
proof of Theorem 33.5). We conclude, by Lemma 33.3 or Theorem 33.5, that f is 
weakly lower semi-continuous on H!({2). 

As a Hilbert space, Hy (2) is weakly complete and thus the set M = vp + Hi, (Q) 
is weakly closed. Therefore, Theorem 33.4 applies and we conclude that there is a 
minimizing element v for the functional f on M. 

Since the minimizing element v € M satisfies f(v) = f(vo tu) < f(vo +w) for 
allwe Hi (2), we deduce as earlier that f’(v)(w) = 0 for all w € Hj(2) and thus 


0= fw) = / Dv(x)- Dw(x) dx Vwe D(Q). 
Q 


Recalling the definition of differentiation in the sense of distributions, this means 
—Av = 0 in the sense of D’(2). Now the Lemma of Weyl (see [2, 3]) implies that 
—Av = 0 also holds in the classical sense, i.e., as an identity for functions of class 
Cc. 

Because for u € Hj(2) one has ujaq = 0, the minimizer v satisfies the boundary 
condition too. Thus we conclude. 

As a simple application of the theory of constrained minimization, we solve the 
eigenvalue problem for the Laplace operator on an open bounded domain 2 with 
Dirichlet boundary conditions, i.e., the problem is to find a number A and a function 
u & O satisfying 


—Au=du in 2, upg =0. (36.12) 


36.2 The Dirichlet—Laplace Operator and Other Elliptic Differential Operators 553 


The strategy is simple. On the Hilbert space Hj (S2), we minimize the functional 
f@ = 5 (Du, Du)> under the constraint g(u) = 5 for the constraint functional 
g(u) = $(u, u)2. The derivative of g is easily calculated; it is g’(u)(v) = (u, v)2 for 
all v € Hj (2) and thus the level surface [g = 5] consists only of regular points of 
the mapping g. 

Since we know that f is weakly lower semi-continuous and coercive on Hj (2), 
we can prove the existence of a minimizer for the functional f on[g = 5] by verifying 
that [g = 5] is weakly closed and then to apply Theorem 33.4. 

Suppose a sequence (u;)j<n converges to u weakly in Hj (2). According to the 
Sobolev embedding (36.7) the space Hj (S2) is compactly embedded into the space 
L?(Q) and thus this sequence converges strongly in L?(@) to u. It follows that 
g(uj) > g(u) as j — Ov, ie., g is weakly continuous on Hate) and its level 
surfaces are weakly closed. 

Theorem 33.4 implies the existence of a minimizer of f under the constraint 
g(u) = 1/2. Using Corollary 35.1 and Theorem 35.2, we deduce that there is a 
Lagrange multiplier 4 € R for this constrained minimization problem, 1.e., a real 
number A satisfying f’(u) = Ag’(u). In detail this identity reads 


/ Du(x)- Dv(x)dx = af u(x)v(x) dx Vve Hy(2), 
Q 2 


and in particular for all v € D(S2), thus —Au = Au in D'(2); and by elliptic 
regularity theory (see for instance Sect. 9.3 of [3]) we conclude that this identity 
holds in the classical sense. Since the solution u belongs to the space Hj({2) it 
satisfies the boundary condition ujjq = 0. This proves: 


Theorem 36.4 (Dirichlet Laplacian) Let 2 C R" be a bounded open set with 
smooth boundary 082. Then the eigenvalue problem for the Laplace operator with 
Dirichlet boundary conditions (36.12) has a solution. 

The above argument which proved the existence of the lowest eigenvalue 4, of 
the Dirichlet—Laplace operator can be repeated on the orthogonal complement of the 
eigenfunction wu, of the first eigenvalue and thus gives an eigenvalue A. > A, (some 
additional arguments show Az > 41). In this way, one proves actually the existence of 
an infinite sequence of eigenvalues for the Dirichlet—Laplace operator. By involving 
some refined methods of the theory of Hilbert space operators, it can be shown that 
these eigenvalues are of the order A; * constant (A): (see for instance [1]). 

Next we consider more generally the following class of second-order linear partial 
differential operators A defined on sufficiently smooth functions u by 


Au = Aou = ee 0j (> ova) : (36.13) 
j=l i=1 


The matrix a of coefficient functions aj; = aj; € L°°(S2) satisfies for almost all 
x € @ andallé € R’, 


mY) 8F < Yo Eja@i <M YE (36.14) 
j=l j=l 


i,j=l 


554 36 Boundary and Eigenvalue Problems 


for some constants 0 < m < M. Ao is a bounded symmetric operator in L?(Q) 
which is bounded from below, (u, Agu)2 > —r ale for some positive number r 
satisfying 0 < r < %. Here m is the constant in condition (36.14) and c is the 
smallest constant for which ||u||. < c || Du||2 holds for all u € Hy (2). 

As we are going to show, under these assumptions, the arguments used for the 
study of the Dirichlet problem and the eigenvalue problem for the Dirichlet—Laplace 


operator still apply. The associated quadratic form 


Olu, v) = (u, Aov)2 + D> (a;v,aji9:u)2 —- Vusv € Hy(@) 


ij=l 
is strictly positive since the ellipticity condition (36.14) and the lower bound for Ag 
imply 


Olu, u) = (u, Aou)2 + / YS» avnpaji(x)oju(x) dx 


i,j=l 
n 
> —r |lull3 +f mS > (dju(x)Y dx = =r |lull3 + m || Dull3 
2 ; 
j=l 


> (—-rc? +m) || Dull5 =Cp || Dull ; c= —rc’ +m>0. 


As earlier we deduce that the functional f(u) = Q(u, u) is coercive and weakly lower 
semi-continuous on H! (92). Hence, Theorem 33.4 allows us to minimize f on M = 
Vvo+ Hy (2) and thus to solve the boundary value problem for a given vp € H!() or 
on the level surface [g = 5] for the constraint function g(u) = $(u, u)2 on Hy(2). 
The conclusion is that the linear elliptic partial differential operator (36.13) with 
Dirichlet boundary conditions has an increasing sequence of eigenvalues, as it is the 
case for the Laplace operator. 


36.3. Nonlinear Convex Problems 


In order to be able to minimize functionals of the general form (32.2), we first have to 
find a suitable domain of definition and then to have enough information about it. We 
begin with the description of several important aspects from the theory of Lebesgue 
spaces. A good reference for this are paragraphs 18-20 of [4]. 

Let 2 C R” be a nonempty open set and h : 2 x R — R a function such 
that h(-, y) is measurable on 2 for every y € R and y + h(x, y) is continuous for 
almost every x € 2. Such functions are often called Carathéodory functions. If now 
u: $2 — Ris (Lebesgue) measurable, define h(u) : 2 —> Rby h(u)(x) = h(x, u(x)) 
for almost every x € 92. Then h(u) is measurable too. For our purpose it is enough to 
consider h on Lebesgue integrable functions u € L?({2) and we need that the image 
h(u) is Lebesgue integrable too, for instance h(u) € L4(§2) for some exponents 
1 < p,q. Therefore the following lemma will be useful. 


36.3 Nonlinear Convex Problems 555 


Lemma 36.1 Suppose that 2 C R" is a bounded open setandh: 2x R—>R 
a Carathéodory function. Then h maps L?(Q) into L4(Q), if and only if, there are 
0 <a e€ L4(Q2) and b = 0 such that for almost all x € 2 andall y € R, 


|A(x, y)| < a(x) + bly|?/4. (36.15) 


If this condition holds the map h: L?(Q) > L4(82) is continuous. 

This result extends naturally to Carathéodory functions h : 2 x R"+! > R. For 
uj € L?i(22), j = 0,1,...,n define h(uo, wey Un (x) = h(x, uo(x),...,Un(x)) for 
almost every x € 92. Then h: LP°(§2) x +++ x LP(Q2) > L41(2), if and only if, 
there are 0 < a € L4(2) and b > O such that 


A(x, Y0s-+ Yn) Sale) + bY) yl?!" (36.16) 
j=0 


And h is continuous if this condition holds. 

As a last preparation define, for every u € W!?(2), the functions y(u) = 
(yo(u), yi(4), - +» Yn(u)) Where yo(u) = u and y;(u) = dju for j = 1,...,n. By 
definition of the Sobolev space W!?(2) we know that 


y: w! (2) > L?(Q) SO sae SE L?(2) = Qk 


is a continuous linear map. 
Now suppose that the integrand in formula (32.2) is a Carathéodory function and 
satisfies the bound 


[F(x yy) <a) +b >i lyil?, (36.17) 
j=0 


for all y € R"*! and almost all x € 9, for some 0 < a € L'() and some 
constant b > 0. Then, as a composition of continuous mappings, F o y is a well- 
defined continuous mapping W!?(92) + L!(2). We conclude that under the growth 
restriction (36.17) the Sobolev space W!?(Q2) is a suitable domain for the functional 


fuy= f F(x, u(x), Du(x)) dx. (36.18) 
Q 


For 1 < p < ov, the Sobolev spaces W!?({2) are known to be separable reflexive 
Banach spaces, and thus well suited for the direct methods ([1]). 


Proposition 36.1 Let 2 C R" be a bounded open set and F : 2 x R"*! > Ra 
Carathéodory function. 


(a) If F satisfies the growth restriction (36.17), then a functional f : W'?(Q) > R 
is well defined by (36.18). It is polynomially bounded according to 


[FI < lal + dllullS +b Dull Yue WhP(Q). (36.19) 


556 36 Boundary and Eigenvalue Problems 


(b) If F satisfies a lower bound of the form 
F(x, y) = —a(x) — Blyol” + clyl? (36.20) 


for all y = (yo, y) € R"*! and almost all x € 2, for some 0 < a € L'(Q), 
B => 0,c > Oand0 <r < p, then the functional f is coercive. 

(c) Ify +» F(x, y) is convex for almost all x € 92, then f is lower semi-continuous 
for the weak topology on W'?(Q). 


Proof To complete the proof of Part (a) we note that the assumed bound for F 
implies that |F o y(u)(x)| < a(x) + b ee |y;(u)(x)|? and thus by integration the 
polynomial bound follows. 

Integration of the lower bound F(x, u(x), Du(x)) > —a(x)—B|u(x)|"+ce| Du(x)|? 
for almost all x € 92 gives f(u) = — |loll) — Bllull; +c || Dull>. By inequality (36.9), 
|u|, < S” || Dull’, hence f(u) > oo as ||Du||, — oo since r < pandc > 0. 

For any u,v € W!?(2) andO < t < 1 we have Fo(tu+(1—2)v)) = F(ty(u) + 
(1 —t)yQ)) < tF(yw)) + d — 1) F(y(Q)) since F is assumed to be convex with 
respect to y. Hence, integration over §2 gives f(tu+(1—f)v) < tf(wM+U—-2t)fO). 
This shows that f is a convex functional. According to Part (a), f is continuous on 
W!(Q), therefore Lemma 33.3 implies that f is weakly lower semi-continuous on 
W!(Q),. 

Let us remark that the results presented in Part (c) of Proposition 36.1 are not 
optimal (see for instance [2, 5, 6]). But certainly the result given above has the 
advantage of a very simple proof. The above result uses stronger assumptions insofar 
as convexity with respect to u and Du is used whereas in fact convexity with respect 
to Du is sufficient. 

Suppose we are given a functional f of the form (36.18) for which parts (a) and 
(c) of Proposition 36.1 apply. Then, by Theorem 33.3 we can minimize f on any 
bounded weakly closed subset M C W!(Q2). If in addition f is coercive, i.e., if 
Part (b) of Proposition 36.1 applies too, then we can minimize f on any weakly 
closed subset M C W!?(Q). 

In order to relate these minimizing points to solutions of nonlinear partial differ- 
ential operators, we need differentiability of the functional f. For this we will not 
consider the most general case but make assumptions which are typical and allow a 
simple proof. 

Let us assume that the integrand F of the functional f is of class C! and that all 

OF 


derivatives Fj = ae are again Carathéodory functions. Assume furthermore that 
Jj 


there are functions 0 < a; € L?'(Q) and constants b; > 0 such that for all y € R"*! 
and almost all x € 92, 


|Fj(x,y)| < ajax) + bj > aes j=0,1,...,n (36.21) 


i=0 


where p’ denotes the Hélder conjugate exponent, ; + a = |. Since (p — 1)p’ = p 


we get for all wu € W!-?() the simple identity |||y;(u)|?~'| = |yiol> and it 


36.3 Nonlinear Convex Problems 557 


follows that Fi(y(w)) € L?'(Q) for allu € W!?(Q)and j = 0,1,...,n. This implies 
the estimates, for all u,v € W!?(2), 


|FOMy |], <|FiO®M|, Ol, f= 01.00 
and thus 
Ve [ > F(x, yuy(x)) yj (V(x) dx (36.22) 
j=0 


is a continuous linear functional on W!?(2), for every u € W!?(2). Now it is 
straightforward (see Exercises) to calculate the derivative of the functional f, by 
using Taylor’s Theorem. The result is the functional 


f'WO) = i > F(x, yu(x))yjv)xydx Vuyve W!?(Q). (36.23) 
j=0 


As further preparation for the solution of nonlinear eigenvalue problems we 
specify the relevant properties of the class of constraint functionals 


g(u) = / G(x, u(x)) dx, ue W!?(2) (36.24) 

Q 
which we are going to use. Here G is a Carathéodory function which has a derivative 
Go = aG which itself is a Carathéodory function. Since we are working on the 


space W'?(Q2) we assume the following growth restrictions. There are functions 
0<aeL(2)and0 <a e€ L”’ (2) and constants 0 < £, fo such that for all 
u € Rand almost all x € 2, 


IG(x, u)| < w(x) + Blul’, |Go(x, w)| < a(x) + Bolul! (36.25) 


with an exponent gq satisfying 2 < g < p*. Because of Sobolev’s inequality (36.9) 
the functional g is well defined and continuous on W!?(Q) and its absolute values 
are bounded by |g(u)| < loll, + Bllwllg. 

Since 2 < q < p* there is an exponent | < r < p* such that (g — l)r’ < p* 
(in the Exercises the reader is asked to show that any choice of r with ier < 
r < p* satisfies this requirement). Then Holder’s inequality implies | |ul?-'v||, < 
|| |uje-! |. \|v||. Therefore the bound for Gg shows that for every u € W'?(Q) the 
functional v he Go(x, u(x))v(x) dx is well defined and continuous on W!?({2). 
Now it is straightforward to show that the functional g is Fréchet differentiable on 
W!?(Q) with derivative 


g'(u)(v) = / Go(x, u(x))v(x) dx Vu,v € W!?(2). (36.26) 
2 


Finally, we assume that g has a level surface [g = c] with the property that g’(u) 4 0 
for allu € [g = c]. 


558 36 Boundary and Eigenvalue Problems 


A simple example of a function G for which all the assumptions formulated above 
are easily verified is G(x, u) = au? for some constant a > 0. Then all level surfaces 
[g =c], c > 0, only contain regular points of g. 

The nonlinear eigenvalue problems which can be solved by the strategy indicated 
above are those of divergence type , i.e., those which are of the form (36.27) below. 


Theorem 36.5 [Nonlinear eigenvalue problem] Let 2 C R" be a bounded open set 
with smooth boundary 892 and F : 2 x R"*! — Ra Carathéodory function which 
satisfies all the hypotheses of Proposition 36.1 and in addition the growth restrictions 
(36.21) for its derivatives F;. Furthermore, letG : Q x R — R be a Carathéodory 
function with derivative Go which satisfies the growth conditions (36.25). Finally, 
assume that the constraint functional g defined by G has a level surface [g = c] 
which consists of regular points of g. Then the nonlinear eigenvalue problem 


Fo(x, u(x), Du(x)) — > 0; Fj (x, u(x), Du(x)) = AGo(x, u(x)) (36.27) 


j=l 
with Dirichlet boundary conditions has a nontrivial solution u € Wy’? (2). 


Proof Because of the Dirichlet boundary conditions we consider the functionals f 
and g on the closed subspace 


E= Wy?(Q) = closure of D(Q) in W! (2). (36.28) 


Proposition 36.1 implies that f is a coercive continuous and weakly lower semi- 
continuous functional on E. The derivative of f is given by the restriction of the 
identity (36.23) to E. 

Similarly, the functional g is defined and continuous on E and its derivative is 
given by the restriction of the identity (36.26) to E. Furthermore, the bound (36.25) 
implies that g is defined and thus continuous on L4({2). 

Now consider a level surface [g = c] consisting of regular points of g. Suppose 
(Un)neN 18 a Weakly convergent sequence in E, with limit u. Because of the compact 
embedding of EF into L4(S2) this sequence converges strongly in L4({2). Since g 
is continuous on L4(§2) we conclude that (g(u,))nen converges to g(u), thus g is 
weakly continuous on £. Therefore all level surface of g are weakly closed. 

Theorem 33.4 implies that the functional f has a minimizing element u € [g = c] 
on the level surface [g = c]. By assumption, uw is a regular point of g, hence Theorem 
35.2 on the existence of a Lagrange multiplier applies and assures the existence of a 
number A € R such that 


f'(u) = Ag’). (36.29) 


In detail this equation reads: f’(u)(v) = Ag’(u)(v) for all v € E and thus for all v in 
the dense subspace D({2) of E = Wy’? (2). 
For v € D(S2) we calculate 


f' Wi) = f Fo(x, u(x), Du(x))v(x) dx + ; by Fj (x, u(x), Du(x))dj;v(x) dx 
Q 2 


36.3 Nonlinear Convex Problems 559 


= F(x u(x), Duyn dx + f S> aj[Fi(x, u(x), Du(x)v(x)] dx 
Q 2 


— i S> (Oj Fi(x, ua), Du(x)))v(x) dx 


j=l 


= [ [Fo(x, u(x), Du(x)) — > (0; F(x, u(x), Du(x)) v(x) dx 
j=l 


since the second integral vanishes because of the Gauss divergence theorem and 
v € D(&2). Hence Eq. (36.29) implies 


n 


i; [Fo(x, u(x), Du(x)) — y. (0; F(x, u(x), Du(x)) — AGo(x, u(x) v(x) dx = 0 
2 


j=l 


for all v € D(S2). We conclude that u solves the eigenvalue Eq. (36.27). 
Remark 36.1 


1. A very important assumption in the problems we solved in this section was that 
the domain 2 C R” on which we studied differential operators is bounded so 
that compact Sobolev embeddings can be used. Certainly, this strategy breaks 
down if 2 is not bounded. Nevertheless there are many important problems on 
unbounded domains {2 and one has to modify the strategy presented above. In 
the last 20 years considerable progress has been made in solving these global 
problems. The interested reader is referred to the books [1, 3] and in particular to 
the book [6] for a comprehensive presentation of the new strategies used for the 
global problems. 

2. As is well known, a differentiable function can have other critical points than 
minima or maxima for which we have developed a method to prove their exis- 
tence and in favorable situations to calculate them. For these other critical points 
of functionals (saddle points or mountain passes) a number of other, mainly topo- 
logical methods have been shown to be quite effective in proving their existence, 
such as index theories, mountain pass lemmas, perturbation theory. Modern books 
which treat these topics are [2, 6] where one also finds many references to original 
articles. 

3. The well-known mountain pass lemma of Ambrosetti and Rabinowitz is a 
beautiful example of results in variational calculus where elementary intuitive 
considerations have lead to a powerful analytical tool for finding critical points 
of functionals f on infinite dimensional Banach spaces E. 

To explain this lemma in intuitive terms consider the case of a function f on 
E = R? which has only positive values. We can imagine that f gives the height 
of the surface of the Earth over a certain reference plane. Imagine further a town 
To which is surrounded by a mountain chain. Then, in order to get to another town 
T, beyond this mountain chain, we have to cross the mountain chain at some point 


560 36 Boundary and Eigenvalue Problems 


S. Certainly we want to climb as little as possible, i.e., at a point S with minimal 
height f(S). Such a point is a mountain pass of minimal height which is a saddle 
point of the function f. All other mountain passes M have a height f(M) > f(S). 
Furthermore, we know f(7To) < f(S) and f(7\) < f(S). In order to get from 
town Jp to town T; we go along a continuous path y which has to wind through 
the mountain chain, y(O) = 7p and y(1) = T7;. As described above we know 
supy<r<i f (Y(t) = f(S) and for one path yp we know supy<,<; f(yo(t)) = f(S). 
Thus, if we denote by I the set of all continuous paths y from 7p to T; we get 


f(S) = inf sup f(y@), 
vel 0<t<1 


i.e., the saddle point S of f is determined by a “minimax” principle. 

4. Ifu € E isacritical point of a differentiable functional f of the form (36.18) ona 
Banach space E£, then this means that u satisfies f’(u)(v) = 0 for all v € E. This 
means that u is a weak solution of the (nonlinear) differential equation f’(u) = 0. 
But in most cases we are actually interested in a strong solution of this equation, 
i.e., a solution which satisfies the equation f’(u) = 0 at least pointwise almost 
everywhere. For a classical solution this equation should be satisfied in the sense 
of functions of class C?. For the linear problems which we have discussed in 
some detail we have used the special form of the differential operator to argue 
that for these problems a weak solution is automatically a classical solution. The 
underlying theory is the theory of elliptic regularity. The basic results of this 
theory are presented in the books [2, 3]. 


36.4 Exercises 


1. Let H be a real Hilbert space and K C H a closed convex subset. For every 
x € H show: The functional yy, : K — R defined by w,.(z) = ||x — z|| is strictly 
convex and coercive (if K is not bounded). 

2. Calculate the derivative of the functional f, Eq. (36.18) by using the assumptions 
(36.17) and (36.21). 

3. For the Sobolev space Hg (la, b]) prove: 

a) |u(x) — u(y)| < ||u’|l2/ 1x — yT for all uw € Ay ({a, b]) and all x, y € [a,b]; 
b) llull2 < llu'I2 for all w € Hy ([a, b)). 

4. Suppose functions p,q € C({a,b]) are given which satisfy the lower bounds 
P(x) => c > Oand q(x) => —r withr > O such that cp = c—r(b—a) > 0. Prove: 
a) Given any g € L?({a, b]), the functional 


1 b 1 b b 
fwM= 5 / p(x)ul(x)? dx + ; J. q(x)u(x) dx — ; g(x)u(x) dx 


a a 


1 1 
= 5 pu’). + 5 (us qu)2 — (g,U)2 


has a unique minimum up on the Sobolev space Hy (la, b)); 


36.4 Exercises 561 


b) this unique minimum up solves the Sturm—Liouville problem for the interval 
[a,b] and the coefficient functions p,q, g, i.e., the problem of solving the 
equation 


d d 
1p) (x)] + q(x )u(x) = g(x)u(x) Vx € (a,b) (36.30) 
x dx 


for the boundary conditions u(a) = 0 = u(b). 
Hints: Observe the previous problem and show that f is a strictly convex coercive 
functional on the Sobolev space Hd (la, b]). Conclude by our general results. 
Deduce that under the assumptions g € C([a,b]) and p € C!({a,b]) the weak 
solution up is actually a classical solution of the Sturm—Liouville problem (36.30). 

5. Given an exponent 1 < p <n and an exponent q satisfying 2 < q < p* find 
an exponent r, 1 < r < p*, such that (q — 1)p’ < p* where p’ is the Hélder 
conjugate exponent of the exponent p. Show that the Sobolev space W!-?({a, b]) 
is contained in the space of continuous functions C([a, b]) on the closed interval 
[a, b] and that the identical embedding is completely continuous, i.e., continuous 
and compact. 

Hints: For u ¢ W'?({a,b]) and x,y € (a,b) show first that |u(x) — u(y)| < 
1 
lx — yl? [lu Ilp- 

6. For a bounded open set 2 C R” with smooth boundary 02 and an exponent p, 
2<p<2= an, find a solution of the following nonlinear boundary value 
problem: 

—Au+au = ulul|?-?, u>O inQ, 


and u = 0 on 0s2. Assume A > —Aj, A; the smallest eigenvalue of the Dirichlet— 
Laplace operator on (2. 
Hints: Consider the functional f(u) = (Du, Du). + 5(u, u)2 on the Sobolev 
space E = Hi (2) and minimize it under the constraint g(u) = 1 with the 
constraint functional g(u) = + J g |u(x)|? dx. Apply the theorem on the existence 
of a Lagrange multiplier and show that the Lagrange multiplier is positive, using 
the lower bound for the parameter 2. Finally use a rescaling argument. 

7. For a bounded open set S2 C R” with smooth boundary 0§2 and an exponent p, 
2<p<2*= a4 solve the nonlinear eigenvalue problem 


—Au+ Au=AB(x)ulu|\’-? in Q 


under Dirichlet boundary conditions. A is a bounded symmetric operator in L7(2) 
with lower bound A > —Aj;, A; the smallest eigenvalue of the Dirichlet—Laplace 
operator on £2. B is a nonnegative essentially bounded function on 2, B # 0. 
Hints: Minimize the functional f(u) = 5(Du, Du). + $(u, Au), on the 
Sobolev space E = Hy (2) on [g = 1] for the constraint functional g(u) = 
5 7 g B(x)|u(x)|? dx. Apply the theorem on the existence of a Lagrange multiplier. 

8. For a bounded open set 2 C R” with smooth boundary 02 and an exponent p, 
2 < p < oo, show that there exists a weak solution u € W,  (Q) of the boundary 
value problem 


—V-(\Vul? Vu =g ind, 


562 36 Boundary and Eigenvalue Problems 


u=0 on 092 


in the sense that u satisfies the equation 


/ [VulVul? Vv — gv]dx =0 Yve D(Q) (36.31) 
2 


where g is any given element in Wo Pay. 
Hints: Consider the functional 


1 
fw= -{ val? dx — f gudx 
PJ2Q 2 


and show that it is well defined and of class C! on the Banach space E = Wo ?(Q). 
Furthermore, show that the left-hand side of Eq. (36.31) is just the directional 
derivative of f in the direction v. Now verify the hypotheses of one of the gener- 
alized Weierstrass theorems, i.e., show that f is weakly lower semi-continuous 
on E and coercive. Deduce that a minimizer u of f on E exists and that it satisfies 
Eq. (36.31). 


References 


1. Lieb E, Loss M. Analysis. vol. 14 of Graduate Studies in Mathematics. 2nd ed. AMS, Providence, 
Rhode Island; 2001. 

2. Jost J, Li-Jost X. Calculus of variations. vol. 64 of Cambridge Studies in advanced mathematics. 
Cambridge: Cambridge University Press; 1998. 

3. Blanchard Ph, Briining E. Variational methods in mathematical physics. A unified approach. 
Texts and Monographs in Physics. Berlin: Springer-Verlag; 1992. 

4. Vainberg MM. Variational methods for the study of nonlinear operators. London: Holden Day; 
1964. 

5. Dacorogna B. Weak continuity and weak lower semicontinuity of non-linear functionals. vol. 
922 of Lecture Notes in Mathematics. Berlin: Springer-Verlag; 1982. 

6. Struwe M. Variational methods : applications to nonlinear partial differential equations and 
Hamiltonian systems. vol. 34 of Ergebnisse der Mathematik und ihrer Grenzgebiete, Folge 3. 
3rd ed. Berlin: Springer-Verlag; 2000. 


Chapter 37 
Density Functional Theory 
of Atoms and Molecules 


The Schrédinger equation is a (linear) partial differential equation that can be solved 
exactly only in very few special cases such as the Coulomb potential or the har- 
monic oscillator potential. For more general potentials or for problems with more 
than two particles, the quantum mechanical problem is no easier to solve than the 
corresponding classical one. In these situations, variational methods are one of the 
most powerful tools for deriving approximate eigenvalues EF and eigenfunctions y. 
These approximations are done in terms of a theory of density functionals as pro- 
posed by Thomas, Fermi, Hohenberg and Kohn. This chapter explains briefly the 
basic facts of this theory. 


37.1 Introduction 


Suppose that the spectrum o (#7) of a Hamilton operator H is purely discrete and can 
be ordered according to the size of the eigenvalues, i.e., Ej < E, < E3 <.--.The 
corresponding eigenfunctions y; form an orthonormal basis of the Hilbert space H. 
Consider a trial function 


CO oe) 
v= ai, lar Sh 
i=l i=l 


The expectation value of H in the mixed state y is 


It can be rewritten as 

E = Ey + \eo|"(E2 — Ei) + |e3|"(E3 — Ei) +--+ > Ei. 
Hence, E is an upper bound for the eigenvalue E, which corresponds to the ground 
state of the system. One basic idea of the variational calculations concerning spectral 


© Springer International Publishing Switzerland 2015 563 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2_37 


564 37 Density Functional Theory of Atoms and Molecules 


properties of atoms and molecules is to choose trial functions depending on some 
parameters and then to adjust the parameters so that the corresponding expectation 
value E is minimized. 

Application of this method to the helium atom by Hylleras played an important 
role in 1928-1929 when it provided the first test of the Schrédinger equation for 
a system that is more complicated than the hydrogen atom. In the limit of infinite 
nuclear mass, the Hamilton operator for the helium atom is 


hi? ee e e 
H =-—(A;+A))-—-—+-, 
2m r} m Fy12 
where r; = |x;| and where rj2 = |x; — xX2| is the electron—electron separation. The 


2 


ry 
duced trial functions of the form yw = > 
parameters aj;,, a and f. 

The history of the density functional theory dates back to the pioneering work of 
Thomas [1] and Fermi [2]. In the 1960s, Hohenberg and Kohn [3] and Kohn and 
Sham [4] made substantial progress to give the density functional theory a foundation 
based on the quantum mechanics of atoms and molecules. Since then an enormous 
number of results has been obtained, and this method of studying solutions of many 
electron problems for atoms and molecules has become competitive in accuracy with 
up to date quantum chemical methods. 

The following section gives a survey of the most prominent of these density func- 
tional theories. These density functional theories are of considerable mathematical 
interest since they present challenging minimization problems of a type which has 
not been attended to before. In these problems, one has to minimize certain function- 
als over spaces of functions defined on unbounded domains (typically on R*) and 
where nonreflexive Banach spaces are involved. 

The last section reports on the progress in relating these density functional theories 
to the quantum mechanical theory of many electron systems for atoms and molecules. 
Here the results on self-adjoint Schrddinger operators obtained in Part B will be the 
mathematical basis. The results on the foundation of density functional theories are 
mainly due to Hohenberg—Kohn [3] and Kohn—Sham [4]. The original paper by 
Hohenberg—Kohn has generated a vast literature, see, for instance, [5—9]. 

Fifty years after the starting point of density functional theory its applications in 
chemistry and the study of electronic structures have been growing steadily, but the 
precise form of the energy functional is still elusive. Recently, in the same spirit, a 
formulation with phase space variables and Wigner functions has been suggested in 
[10]. 


term describes the Coulomb repulsion between two electrons. Hylleras intro- 


pi pd pkg —ar}—Br2 ‘ 
i, jk GijkT {120 12 & depending on the 


37.2 Semiclassical Theories of Density Functionals 565 


37.2 Semiclassical Theories of Density Functionals 


The main goal of these semiempirical models is to describe correctly the ground state 
energy by minimizing various types of density functionals. 

In all these density functional theories, we are looking for the energy and the 
charge density of the ground state by solving directly a minimization problem of the 
form 


min | Fo) + f ecovesyax Ipe rf. 


Here, F is a functional of the charge density and depends only on the number N of 
electrons but not on the potential v generated by the nuclei. The minimum has to be 
calculated over a set Dy of densities which is either equal to or a subset of Dy = 
{ p€ LR): 0<p, loll =N } depending on the specific theory considered. Let 
us mention some of the prominent models. 


¢ The model of Thomas and Fermi uses the functional 
Frr(p) = cre / PAY" dx + D(p, p) 
R 


on the domain Drr = Dy N L* 3(R3). In the simplest models of this theory, the 
potential v is given by v(x) = — ne where Z > (isa fixed parameter representing 
the charge of the atomic nucleus and 


PY) 4 
D(p, p) = =f{ [ee ray xdy 


is nothing else than the Coulomb energy for the charge density p. The constant 
Crr has the value 3/5. 
¢ The model of Thomas—Fermi-von Weizsdcker is associated with the functional 


Repay i (Vp(x)") dx + Fre(p) 
R 


on the domain Dyrw = Drr N H'(R3). 
¢ The model of Thomas—Fermi—Dirac—von Weizsdcker leads to the functional 


Frrepw(p) = Frew(e) — cp [ p(x? dx 
R 


on the same domain Dyry. Note that for 1 < py <q < pr ¢ one has L7(R3) c 


L?1(R3) N L?2(IR3) and lullo < lull, lull p> ‘ with f = i which we apply 


for py = 1, po = 5/3 and gq = 4/3. It follows that tol is finite on Drryw. 
Therefore, the domain of Frrpw is Drrw. 


All these models describe partially some observed natural phenomena but are nev- 
ertheless rather rudimentary and are no longer in use in the practice of quantum 


566 37 Density Functional Theory of Atoms and Molecules 


chemistry. From a theoretical point of view these models are quite interesting since 
we are confronted with the same type of (mathematical) difficulties as in more realistic 
approaches. 

Though the Thomas—Fermi theory is quite old, a mathematically rigorous solution 
of the minimization problem has been found only in 1977 by Lieb and Simon [11]. 
The basic aspects of this solution are discussed in [12, 13]. 


37.3. Hohenberg—Kohn Theory 


The Hohenberg—Kohn theory is a successful attempt to link these semiclassical 
density functional theories to the quantum mechanics of atoms and molecules. Never- 
theless from a mathematical point of view there remain several challenging problems 
as we will see later. 

The N-particle Hamilton operators which are considered are assumed to be of the 
form 


N N 
Hy = Hy(v) = -\oAj +> ux; — xx) + D° v(xs) =H+V, (37.1) 
j=l 


j<k j=l 


where v(x) is a real-valued function on R? and V = = 4 v(x ;). In typical situations 
u denotes the Coulomb interaction, but many other interactions can be used in this 
approach too. We restrict ourselves to the Coulomb case u(x; — x,) = moar In this 
case, the operator Hp is well defined and self-adjoint on the domain D(T) of operator 
T=- em A; of the kinetic energy (compare Theorem 23.9 and the exercises for 
this theorem). For the one-particle potential v we assume in the following always 
v € L?(R*) + L™(R) so that for these potentials too Kato’s perturbation theory 
applies and assures that Hy is self-adjoint on D(T). Note that L?(R*) + L™(R?) is 
a Banach space when equipped with the norm 


[lvl] = inf {Ilville + llvalloo 2 v1 € L7(R?), v2 € L*(R®),v = v1 + v9}. 


However, this Banach space is not reflexive. It is actually the topological dual of the 
Banach space X = L!(R*)N L?(R°) for the norm ||u|| = ||ul|, + |lull3, i-e., 
X’ = L?(R*) + L®(R’). 


In 1964, Hohenberg and Kohn proposed a method to solve the problem of finding 
the ground state energy of Hy through a varational principle. To explain this method 
we need some preparation. The single-particle reduced density matrix y of an N- 
particle wave function y is given by the kernel 


y(z,z7)= [Fees E.2-nzw)dea dzy, (37.2) 


37.3 Hohenberg—Kohn Theory 567 


where z; = (x;,0;) denotes the space variable x; and the spin variable o;. This 
formula defines a mapping yY — y. This density matrix allows us to express the 
single particle density as 


p(x) =N D> v(x, 0), (x,0)) (37.3) 


which defines a mapping y — and thus amapping v > p, = R(v) from potentials 
v to one-particle densities when w is a ground state of Hy(v). This mapping R 
plays a fundamental role in the Hohenberg—Kohn theory. Denote by Gy the set of 
all those potentials v for which the Hamiltonian Hy(v) has a (unique) ground state 
w € D(T). Then we consider R as a mapping 


R:GyOX' > {pe L'(R’):0< p}, (37.4) 


and one wants to know when this mapping has an inverse. In order to be able to 
make progress in this problem one has to have a characterization of the range of the 
mapping R, i.e., one has to know: Under which conditions on p there is a potential 
v € Gy Nf X’ such that the Hamilton operator Hy(v) has a ground state y which 
defines p = py, through Egs. (37.2) and (37.3). 

Up to now this problem has found only a partial solution which nevertheless allows 
us to proceed. There are two conditions which are obviously necessary, namely 
0 < p(x) for all x € R? and |lp||, = N, ie., o € L'(R%). The following lemma 
gives additional necessary conditions. 


Lemma 37.1 Suppose p = py is obtained by Eqs. (37.2) and (37.3) from a state 
w the kinetic energy T. Then 


a) p'? © H\(R) and |Vp'?|; < TW) 
b) p € LAR) NL'(R®) and || py ||, < constant T() 


Proof The kinetic energy is defined by 
N 
TOW) = Sef IVAW ties ada + day 
i=l 


=N f IMG. aw Pda dxy. 


For the density we calculate 


V p(x) = nef (Vex, x2, oe Xn) oF (WV W)(X, x2, .. Xn )]d x2 aa dxy, 
and Schwarz’ inequality implies 


IV p(x)? < an [ (Vi w(x, x2,-- xn) d xa +++ dxy p(x). 


568 37 Density Functional Theory of Atoms and Molecules 


We deduce i d 
veltp=7 fo 2S" < TW). 
Vo], = 5 | Vea OAL 
This implies Part a). 
Sobolev’s inequality in R? states (see (36.9)) ||u| E 
2 2 
for u = p'/? to get |lplls = |[o' |, < S||Ve'? |, 
statement of part b) follows. 


S \| Vull5 which we apply 
ST(w) < o. Thus, the 


IA IA 


Corollary 37.1 


ran R © {p € L'(R?)n L3(R’) :0< p, pie H'(R*)} = 


and for p € D there is a state in the domain D(T) such that p = py. 


Proof The first part of the corollary is just a summary of the previous lemma. Given 
p € D define y as a normalized symmetric N-fold tensor product of p!/?. Since 
f(Ve@y < 00 it follows that y € D(T). 

Note that this corollary only gives some estimate of the set of those densities p 
for which there is v € Gy MN X’ such that p is the density of a ground state y of 
Hy(v). The problem is that the set Gy is not known explicitly and thus the range of 
the map R is not known precisely. 

The map w + - is clearly not bijective and different y can give the same p. 
However, one can prove continuity though the proof is not too easy (see the appendix 
of [14]). Part of the difficulty comes from the fact that this map is not linear. Observe 
that the space H!(R*”) is the form domain of the kinetic energy T. 


Theorem 37.1 yy +> p!”” is a continuous map H'(R3%) > H'(R°). 
Recall that we only consider one-particle potentials v € X’ so that the domain of 
the N-particle Hamiltonian Hy(v) is the domain 


Wy = {We L7(R*’): TW) < wo} = D(T) 


of the kinetic energy T. This allows us to determine the ground state energy of Hy(v) 
as the solution of a minimization problem: 


E(v) = inf (Hy) (37.5) 


eWv\(0)} (WW) 
There may or may not be a minimizing element y for the minimization problem 
(37.5) for the ground state energy. And if there exists one we do not always have 
uniqueness. Accordingly, any minimizing element yy of (37.5) is called a ground 
state of Hy(v). It satisfies Hy(v)w = E(v)y at least in the sense of distributions. 
E(v) has some important properties. 


37.3 Hohenberg—Kohn Theory 569 


Theorem 37.2 The ground state energy E(v) defined by (37.5) has the following 
properties. 


a) E(v) is concave inv € X’, i.e. for all v,,v2 € X' and all 0 < t < 1 one has 
E(tvy) + 0 = t)va) = ti) + 0 — t)E (v2) 

b) E(v) is monotone increasing, i.e., if v1, v2 € X' and v\(x) < v2(x) forall x € R’, 
then E(v,) < E(v2) 

c) E(v) is continuous with respect to the norm of X' and it is locally Lipschitz. 


Proof See the Exercises. 

The key result of the Hohenberg—Kohn theory is the observation that under certain 
conditions different potentials vj,v2 € Gy M X’ lead to different densities 1, 02, 
thus proving injectivity of the map R. 


Theorem 37.3 (Uniqueness Theorem) Suppose v,,v2 € Gy O X' are potentials 
for which the Hamilton operators Hy(v,) and Hy(v2), respectively, have different 
ground states Wf, 2. Then the densities py, , Py, defined by these states are different, 
Py, (X) F Py,(x) for all points x in a set of positive Lebesgue measure. 


Proof We give the proof for the case where the ground state energies for both 
operators Hy(v;) and Hy(v2) are not degenerate. For the general case, we refer to 
the literature [7]. 

According to our definitions, we know E(v;) = (Wi,Hw(i)wWi), Wi € Wn, 
vill = land E(y;) < (W, HnQi)W) for all y € Wy, lly] = 1 and EQ) < 
(Ww, Hn(v,)W) for all wy € Wy, |lw|| = 1, ¥ A Ww, i = 1,2. Equations (37.1)-(37.3) 
imply (W, Hy (vid) = (hr, How) +N f vilx)py(x) dx, hence 


E(v1) = (1, How) + vf vacope(x)ax a v foe — v2(X)) Py, (x) dx 


= heyeN / Ci Be aaae 


and similarly E(v2) > E(vi) + N f (vax) — vi(x)) py, (x) dx. By adding these two 
inequalities we get 


0> v foe — vax) (Py, (X) — Py (x) dx. 


All the above integrals are well defined because of part b) of Lemma 37.1 and the 
interpolation estimate ||p||, < lolly oll’. 

Note that the assumption that Hy (v,) and Hy (v2) have different ground states ex- 
cludes the case that the potentials differ by a constant. This assumption was originally 
used by Hohenberg—Kohn. 

Certainly one would like to have stronger results based on conditions on the 
potentials vj, v2 which imply that the Hamilton operators Hy(v;) and Hy (v2) have 
different ground states wy and yy. But such conditions are not available here. 

The basic Hohenberg—Kohn uniqueness theorem is an existence theorem. It claims 
that there exists a bijective map R : v > p between an unknown set of potentials 


570 37 Density Functional Theory of Atoms and Molecules 


v and a corresponding set of densities which is unknown as well. Nevertheless, 
this result implies that the ground state energy E can in principle be obtained by 
using v = R~!(p), i.e., the potential v as a functional of the ground state density p. 
However, there is a serious problem since nobody knows this map explicitly. 


37.3.1 Hohenberg—Kohn Variational Principle 


Hohenberg and Kohn assume that every one-particle density p is defined in terms 
of a ground state w for some potential v, i.e., Hy(v)W = E(v)w. Accordingly, they 
introduce the set 


Ay = {p e L! nN L3(R3) :O0<p, /pe H'(R°), 4 ground statew: Wr p} 


and on Ay they considered the functional 


Fux(p) = EW) — f veptayae. (37.6) 


This definition of Fyx requires Theorem 37.3 according to which there is a one- 
particle potential v associated with p, v = R'(p). Using this functional the 
Hohenberg—Kohn variational principle reads 


Theorem 37.4 (Hohenberg—Kohn Variational Principle) For any v € Gy 1 X', 
the ground state energy is 


£0) = min [Fux(p)-+ f vp(x)dr (37.7) 


It must be emphasized that this variational principle holds only for v € Gy N X’ 
and p € Ay. But we have three major problems: The sets Gy and Ay and the form 
of the functional Fyx are unknown. On one hand the Hohenberg—Kohn theory is an 
enormous conceptual simplification since it gives some hints that the semiclassical 
density functional theories are reasonable approximations. On the other hand, the 
existence Theorem 37.3 does not provide any practical method for calculating phys- 
ical properties of the ground state from the one electron density o. In experiments 
we measure p but we do not know what Hamilton operator Hy (v) it belongs to. 

The contents of the uniqueness theorem can be illustrated by an example. Consider 
the Nz and CO molecules. They have exactly the same numbers of electrons and 
nuclei, but whereas the former has a symmetric electron density this is not the case for 
the latter. We are, therefore, able to distinguish between the molecules. Imagine now 
that we add an external electrostatic potential along the bond for the Nz molecule. 
The electron density becomes polarized and it is no more obvious to distinguish 
between N>2 and CO. But according to the Hohenberg—Kohn uniqueness theorem it 
is possible to distinguish between the two molecules in a unique way. 

The Hohenberg—Kohn variational principle provides the justification for the 
variational principle of Thomas Fermi in the sense that E7yr(p) is an approxima- 
tion to the functional E(p) associated with the total energy. Let us consider the 


37.3 Hohenberg—Kohn Theory 571 


functional E,(o) = Fye(e) + f v(x)o(x)dx. The Hohenberg—Kohn variational 
principle requires that the ground state density is a stationary point of the functional 
E,(p) — wl f p(x)dx — N] which gives the Euler-Lagrange equation (assuming 
differentiability) 


b= DE\(p) =v + DFyx(p), (37.8) 


where jz denotes the chemical potential of the system. 

If we were able to know the exact functional Fyx(o) we would obtain by this 
method an exact solution for the ground state electron density. It must be noted that 
Fyx(pe) is defined independently of the external potential v; this property means 
that Fyx(p) is a universal functional of p. As soon as we have an explicit form 
(approximate or exact) for Fyx(p) we can apply this method to any system and the 
Euler-Lagrange Eq. (37.8) will be the basic working equation of the Hohenberg— 
Kohn density functional theory. A serious difficulty here is that the functional Fixx (p) 
is defined only for those densities which are in the range of the map R, a condition 
which, as already explained, is still unknown. 


37.3.2 The Kohn—Sham Equations 


The Hohenberg—Kohn uniqueness theorem states that all the physical properties of 
a system of N interacting electrons are uniquely determined by its one-electron 
ground state density o. This property holds independently of the precise form of 
the electron—electron interaction. In particular when the strength of this interaction 
vanishes the functional Fyx«() defines the ground state kinetic energy of a system 
of noninteracting electrons as a functional of its ground state density To(). This fact 
was used by Kohn and Sham [4] in 1965 to map the problem of interacting electrons 
for which the form of the functional Fx (p) is unknown onto an equivalent problem 
for noninteracting particles. To this end Fyx(e) is written in the form 


1 
Fux(p) = To(o) + = / p(e)aly) 
2J |x 


The second term is nothing else than the classical electrostatic self-interaction, and 
the term E,.-(p) is called the exchange-—correlation energy. 

Variations with respect to o under the constraint |||]; = N leads formally to 
the same equation which holds for a system of N noninteracting electrons under the 
influence of an effective potential V,.¢, also called the self-consistent field potential 
whose form is explicitly given by 


dx dy + Exe(p). (37.9) 


Vse¢(X) = v(x) + (c ** =) (x) + Vre(*), (37.10) 


|x| 


where the term v,,(x) = D,E,,-(p) is called the exchange—correlation potential, as 
the functional derivative of the exchange-—correlation energy. 


572 37 Density Functional Theory of Atoms and Molecules 


There have been a number of attempts to remedy the shortcomings of the 
Hohenberg—Kohn theory. One of the earliest and best known is due to E. Lieb 
[14]. The literature we have mentioned before offers a variety of others. Though 
some progress is achieved major problems are still unresolved. Therefore we cannot 
discuss them here in our short introduction. 

A promising direction seems to be the following. By Theorem 37.2 we know 
that — E(v) is a convex continuous functional on X’. Hence (see [15]), it can be 
represented as the polar functional of its polar functional (— E)*: 


—E(v) = sup [(v,u) — (— E)*(w)] Vve X’, (37.11) 
ueX” 


where the polar functional (— £)* is defined on X” by 


(— E)*(u) = sup [(v, u) — (— E)0)] Vue XxX". (37.12) 
vex’ 
Now X = L?(R?) N L!(R?) is contained in the bi-dual X” but this bi-dual is much 
larger (L'(R*) is not a reflexive Banach space) and L3(R*) N L'(R?) Cc L?(R3) Nn 
L'(R3). But one would like to have a representation of this form in terms of densities 
p € Ay C L?(R*)N L'(R’), not in terms of u € X”. 


Remark 37.1 In Theorem 37.4, the densities are integrable functions on all of R? 
which complicates the minimization problem in this theorem considerably, as we 
had mentioned before in connection with global boundary and eigenvalue problems. 
However having the physical interpretation of the functions p in mind as one-particle 
densities of atoms or molecules, it is safe to assume that all the relevant densities have 
a compact support contained in some finite ball in R*. Thus, in practice, one considers 
this minimization problem over a bounded domain B with the benefit that compact 
Sobolev embeddings are available. As an additional advantage we can then work in 
the reflexive Banach space L>(B) since L'(B) C L3(B) instead of L'(R?)N L3(R). 


37.4 Exercises 


1. Prove Theorem 37.2. 
Hints: For v},v2 € X' and 0 < t < 1 show first that Hy(tv; + (1 — t)v2) = 
tHy(1) +1 — t) Hy (12). Part a) now follows easily. For part b) consider v;, v2 € 
X’ such that vj(x) < v(x) for almost all x € R?* and show as a first step: 
(Wy, Hy) < (W, Hn (v2)) for all wy € Wy, |lwl| = 1. 
For part c) proceed similarly and show |(w,(Hy(v1) — Hw(v2))W)| 
N |\v, — voll, for all Y € Wy, |lv|| = 1. This implies +(£(1,) — E(v2) 
N |lv1 — Valloo- 

2. Show that the Coulomb energy functional D is weakly lower semi-continuous on 
the Banach space L°/>(R3). 

3. Prove: The Thomas—Fermi energy functional E7p is well defined on the cone 
Drr = {p € DSP 1 L'(R3): p = O}. 


IA IA 


References 573 


References 


Ne 


ww 


14. 
15. 


. Thomas LH. The calculation of atomic fields. Proc Camb Philos Soc. 1927;23:542-8. 
. Fermi E. Un metodo statistico per la determinazione di alcune proprieta dell’atome. Rend 


Accad Naz Lincei. 1927;6:602-7. 


. Hohenberg P, Kohn W. Inhomogeneous electron gas. Phys Rev B. 1964;136:864 — 71. 
. Kohn W, Sham LJ. Self consistent equations including exchange and correlation effects. Phys 


Rev A. 1965;140:1133-8. 


. Davies EB. Quantum theory of open systems. London: Academic Press; 1976. 


Parr RG, Yang W. Density functional theory of atoms and molecules. Oxford: Oxford University 
Press; 1989. 


. Dreizler RM, Gross EKU. Density functional theory. New York: Springer-Verlag; 1990. 

. Nagy A. Density functional and application to atoms and molecules. Phys Rep. 1998;298: 1-79. 
. Eschrig H. The fundamentals of density functional theory. Leipzig: Teubner Verlag; 1996. 

. Blanchard P, Gracia-Bondia J, Varilly J. Density functional theory on phase space. Int J 


Quantum Chem. 2012;112(4):1134-64. 


. Lieb E, Simon B. The Thomas — Fermi theory of atoms, molecules and solids. Adv Math. 


1977;23:22-116. 


. Blanchard Ph, Briining E. Variational methods in mathematical physics. A unified approach. 


Texts and monographs in physics. Berlin: Springer-Verlag; 1992. 


. Lieb E, Loss M. Analysis. Graduate studies in mathematics. Vol. 14, 2nd ed. AMS, Providence, 


Rhode Island; 2001. 

Lieb E. Density functionals for Coulomb systems. Int J Quantum Chem. 1983;XXIV:243-77. 
Ekeland I, Turnbull T. Infinite dimensional optimization and convexity. Chicago lectures in 
mathematics. Chicago: University of Chicago Press; 1983. 


Appendix A 
Completion of Metric Spaces 


A metric on a set X is a function d : X x X — R with these properties: 


(D1) — d(x, X2) = 0, 

(D2) (x1, X2) = d(x2, x1), 

(D3) (x1, X2) < d(x, x) + d(x, x2), 
(D4) d(x1,xX2) =0} x1 = x2. 


for all x,x1,x2 € X. A set X on which a metric d is given is called a metric space 
(X, d). Sets of the form 


B(x,r)={y € X:diy,x) <r} 


are called open balls in (X,d) with center x and radius r > 0. These balls are used 
to define the topology Tq on X. 

A sequence (%,)nen in (X,d) is called a Cauchy sequence if, and only if, the 
distance d(Xn, Xm) of the elements x, and x,, of this sequence goes to zero as n,m > 
oo. A metric space (X, d) is called complete if, and only if, every Cauchy sequence 
has a limit x in (X, d), ie., if, and only if, for every Cauchy sequence (x,),en there 
is a point x € X such that limp. d(x, x,) = 0. 

In the text we encountered many examples of metric spaces and in many appli- 
cations it was very important that these metric spaces were complete, respectively 
could be extended to complete metric spaces. We are going to describe in some detail 
the much used construction which enables one to “complete” every incomplete space 
by “adding the missing points.” The model for this construction is the construction 
of the space of real numbers as the space of equivalence classes of Cauchy sequences 
of rational numbers. A complete metric space (Y, D) is called a completion of the 
metric space (X,d) if, and only if, (Y, D) contains a subspace (Yo, Do) which is 
dense in (Y, D) and which is isometric to (X,d). The following result ensures the 
existence of a completion. 


Theorem A.1 Every metric space (X,d) has a completion (Y, D). Every two com- 
pletions of (X,d) are isomorphic under an isometry which leaves the points of X 
invariant. 


© Springer International Publishing Switzerland 2015 575 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2 


576 Appendix A Completion of Metric Spaces 


Proof Denote by S = S(X,d) the set of all Cauchy sequences x = (X,)yen in the 
metric space (X,d). Given x, y € S one has the estimate 


|d(Xn, Yn) — A(Xm, Ym)| < AXn, Xm) + Ans Ym) 


which shows that (d(%, ¥n))nen 18 a Cauchy sequence in the field R and thus 
converges. This allows one to define a function d; :S x S > R by 


d(x, y) = lim d(Xn, Yn). 
= n>oo 


Obviously the function d, has the properties (D,) and (D2) of a metric. To verify the 
triangle inequality (D3) observe that for any x, y,z € S and all n € N we have 


(Xn, Yn) < d(Xns Zn) + Ans Yn): 


The standard calculation rules for limits imply that this inequality also holds in the 
limit n — oo and thus proves (D3) for the function d,;. The separation property 
(D4) however does not hold for the function d;. Therefore we introduce in S an 
equivalence relation which expresses this separation property. 

Two Cauchy sequences x, y € S are called equivalent if, and only if, di(x, y) = 0. 
We express this equivalence relation by x ~ y. The properties established thus far for 
the function d, imply that this is indeed an equivalence relation on S. The equivalence 
class determined by the element x € S is denoted by [x], ie., [x] = {y € S: y ~ x}. 
The space of all these equivalence classes is called Y, Y = {[x] : x € S}. Next define 
afunction D: Y x Y > Rby 


d([x], [yl = di(x, y) 


where x, y are any representatives of their respective classes. One shows that D is 
well defined, i.e., independent of the chosen representative: Suppose x’ ~ x, then 
the triangle inequality for the function d, gives 


dy(x,y) < d(x, x’) + d(x’, y)= dia’, y) < di(e’,x) + dia, y) = d(x, y), 


which shows that di (x, y) = di(x’, y) whenever x ~ x’. By definition, the function 
D satisfies the separation property (D4): 


d([x],Lyl) = 0 [x] = bi. 


We conclude that D is a metric on the set Y, hence (Y, D) is a metric space. 

Next we embed the given metric space into (Y, D). For every x € X consider the 
constant sequence x° = (x,x,x,... ). Clearly x° € S and thus a map t : X > Y is 
well defined by 


t(x)=[x°] VxeXx. 


Appendix A Completion of Metric Spaces S77 


By the definition of D, respectively d;, we have 


D(t(x), ty) = di(x°, y°) = d(x, y) 


for all x, y € X, hence the map T is isometric. 
Given [x] € Y choose a representative x = (x1, x2, x3,...) of this class. Then the 
sequence (T(X,))nen converges to [x]: 
lim D(t(x,), [x]) = lim Tia") = lim lim d(x, Xm) = 0. 
noo noo — noo m—->oo 
We conclude that the image Yo = t(X) of X under the isometry Tt is dense in (Y, D). 
Finally we prove completeness of the metric space (Y, D). Suppose ([yn])nen 
is a Cauchy sequence in (Y, D). Since Yo is dense in (Y, D) there is a sequence 
(T(Xn))nen C Yo such that D(t(x,), Lyn ]) < 4 for each n. It is easy to see that 
the sequences (T(%n))nen and ([¥n])nen either both converge or both diverge. Now 
observe that x = (%1,X2,%3,.. ‘) is a Cauchy sequence in the given metric space 
(X,d): 


A(XnsXm) = D(t(Xn), TXm)) 
D(t(tn), Eyal) + D(Lynhs Lym) + DCm] 7Cm)) 


IA 


IA 


1 1 
— + D(lyn}, Lym]) + —. 
n m 


Since (Lyn Dnen is a Cauchy sequence in (Y, D) the statement follows immediately 
and therefore [x] € Y. The identity 


lim D(t(<n),[x]) = lim lim d(%n.xm) = 0 
noo noo m—>0o 


proves that the sequence (T(x;,))nen converges to [x] in the metric space in (Y, D). The 
construction of the points x, implies that the given Cauchy sequence too converges 
to [x] in (Y, D). Hence this space is complete. 

Since we do not use the second part of the theorem we leave its proof as an 
exercise. 


Corollary A.1 Every normed space (X60, || - ||o) has a completion which is a Banach 
space (X, || - ||). Every inner product space (Ho, (-,-)9) has a completion which is a 
Hilbert space (H, (-,-)). 


Proof We only comment on the proof. It is a good exercise to fill in the details. 

According to Theorem A.! we only know that the normed space, respectively the 
inner product space, have a completion as a metric space. But since the original space 
Xo, respectively Ho, carries a vector space structure, the space of Cauchy sequences 
of elements of these spaces too can be given a natural vector space structure. The 
same applies to the space of equivalence classes of such Cauchy sequences. Finally 
one has to show that the given norm, respectively the given inner product, has a 
natural extension to this space of equivalence classes of Cauchy sequences which is 
again a norm, respectively an inner product. Then the proof of completeness of these 
spaces is as above. 


Appendix B 
Metrizable Locally Convex Topological Vector 
Spaces 


A Hausdorff locally convex topological vector space (X, P) is called metrizable if, 
and only if, there is a metric d on X which generates the given topology 7p, i.e., if 
Ta denotes the topology generated by the metric d, one has 7p = Tq. Recall that 
two different metrics might generate the same topologies. In such a case the two 
metrics are called equivalent. Important and big classes of Hausdorff locally convex 
topological vector spaces are indeed metrizable. 


Theorem B.1 Every Hausdorff locally convex topological vector space (X,P) with 
countable system P = {p; : j € N} of continuous seminorms pj; is metrizable. A 
translation invariant metric which generates the given topology is 


[o,e) 
1 pj(x — y) 
d(x,y)= ——_—__ Vx,yeX. B.1 
(x, y) DBT pie -D) x,y (B.1) 


Proof Allthe seminorms pj; are continuous for the topology 7p and the series (B.1) 
converges uniformly on X x X. Therefore this series defines a continuous function 
d on (X, Tp) x (X, Tp). This function d obviously satisfies the defining conditions 
(D,) and (D2) of a metric. The separation property (Dz) holds since the space (X, P) 
is Hausdorff. 

In order to show the triangle inequality (D3) observe first that for any x, y,z € X 
one has 

PiX-Y) PIA T-D pi — y) 
1+ pja—y) 1l+pja-z) 1l+pj@-y) 


since all terms are nonnegative and p(x — y) < pj(x —z)+ pj(z— y). Summation 
now implies the triangle inequality for the function d which thus is a metric on X. 
Obviously this metric d is translation invariant: 


d(ix+zy+z)=d(x,y) Vx,y,zE X. 


Since the metric d is continuous for the topology 7p, the open balls By(x,r) for the 
metric d are open in (X, 7p). Since these open balls generate the topology 7a, we 
conclude that the topology 7p is finer than the metric topology Jy, Ty © Tp. 


© Springer International Publishing Switzerland 2015 579 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2 


580 Appendix B Metrizable Locally Convex Topological Vector Spaces 


In order to show the converse 7p C Jy we prove that every element V of a 


neighborhood basis of zero for the topology 7p contains an open ball Bz(0,r) with 
respect to the metric d. Suppose 


V =, 8,,(0,7) 


with r; > 0 fori = 1,... ,k is given. Choose some number ro, 


2-9; 
0 < 1m <min {2 scott 
Tj 


Then for fixed r, 0 < r < ro, and every x € By(0,r) we know by Eq. (B.1) that 


-i Pi(x) 


oe ee (Cn) oe ee oe 
1+ pi(x) 


These inequalities together with the choice of r < ro imply immediately p;(x) < r;, 

i=1,...,k, hence x € V and thus Bg(0,r) C V which proves Tp C Tq. 
Two examples of Hausdorff locally convex topological vector spaces which are 

metrizable and which were used in the text are the spaces Dx (2) and S(R"). 

Recall that for an open and nonempty set 2 C R” and a compact subset K C 
2, the space Dx (2) consists of all C°-functions on §2 which have their support 
in K. The topology of this space is generated by the countable system of norms 
dkKm:m=0,1,2,...; the space Dx(Q2) is metrizable according to Theorem B.1. 
In Proposition 2.4 we had indicated a proof of its completeness. Hence Dx (S2) is a 
complete metrizable Hausdorff locally convex topological vector space. 

The space S(IR”) is defined as the space of all those C°-functions on R” which, 
with all their derivatives, decay faster than constant x (1 + x?)~*/? for any k = 
0,1,2.... The countable system {pm,% : k,m = 0,1,2,...} of norms defines the 
topology of S(R"). It is a good exercise to prove that this space too is complete. 
Therefore the space S(R”) is a complete metrizable Hausdorff topological vector 
space. 


Appendix C 
The Theorem of Baire 


On an open nonempty set $2 C R” consider a sequence of continuous functions 
Jn : 8&2 — Rand suppose that this sequence has a “pointwise” limit f, i.e., for every 
x € @ the limit lim, f,(x) = f(x) exists. Around 1897, Baire investigated the 
question whether the limit function f is continuous on {2. He found that this is not 
the case in general and he found that the set of points in $2 at which the limit function 
f is not continuous is a “rather small subset of 2.” Naturally a precise meaning had 
to be given to the expression of a “rather small subset of £2.” In this context Baire 
suggested the concept of subset of first category in Q2, i.e., subsets of $2 which can 
be represented as a countable union of nowhere dense sets. And a subset A C Q 
is called nowhere dense in S2 if, and only if, the closure A in 92 has no interior 
points. Later the subsets of first category in §2 were given the more intuitive name 
of a meager subset. All subsets which are not of the first category are called subsets 
of the second category or nonmeager subsets. 

Note the following simple implication of the definition of a nowhere dense subset. 
If B C Q is nowhere dense, then A = 2\B is an open and dense subset of 2, 
Q = A. Thus Baire reduced the above statement about the set of points of continuity 
of the limit function f to the following statement. 


Theorem C.1 (Theorem of Baire, Version 1) /f Aj, j € N, is a countable family 
of open and dense subsets of an open nonempty subset 82 C R", then the intersection 


A= NA; (C.1) 
is also dense in 82. 


Proof Given an open ball By = B(x, 79) = {x € R" : ||x — xoll < ro} in 2 we 
have to show that AM Bo is not empty. 

Since A, is an open and dense subset of £2 we know that A; M Bo is an open 
nonempty subset of Bo. Hence there is an open ball 


By = B(x1,r1) = {x ER": |x — x1] < 11} 


with B; C A; M Bo. We can and will assume that 0 < r, < ro/2. By the same 
reasoning Az / B, is an open nonempty subset of B,. Hence there is an open ball 
By = B(x2,1r2) C 2 with the property By C A2M B, and0 <7 <7;/2. 


© Springer International Publishing Switzerland 2015 581 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2 


582 Appendix C The Theorem of Baire 


These arguments can be iterated and thus produce a sequence of open balls B, = 
B(x, rz) satisfying 


Bea AB ee . ba Wee: 


Per construction r, < 2~*ro and Xkim © By, forallm > 0, hence ||xp4m—Xx|| < 2*ro 
for allk,m = 0,1,2,... . We conclude that the sequence of centers x; of these balls 
B, is a Cauchy sequence in R” and thus converges to a unique point 


y= jim x ene, B = M. 


According to the construction of these balls we get that M C By and M C Ay, for 
all k € N and thus we conclude that MC AN Bo. 

Some years later Banach and Steinhaus realized that Baire’s proof did not use the 
special structure of the Euclidean space R”. The proof relies on two properties of 
IR": R” is a metric space and R” is complete (with respect to the metric). Thus Banach 
and Steinhaus formulated the following result which nowadays usually is called the 
theorem of Baire. 


Theorem C.2 (Theorem of Baire, Version 2) Suppose that (X,d) is a complete 
metric space and 82 C X an open nonempty subset of X. Then the intersection 
A = NienA; of a countable family of open and dense subsets A; of S2 is again dense 
in Q. 


Proof The proof of Version 1 applies when we replace the Euclidean balls B(x,r) 
by the open balls Bg(x,r) = {y € X : d(y,x) <r} of the metric space (X, d). 

In most applications of Baire’s theorem however the following “complementary” 
version is used. 


Theorem C.3 (Theorem of Baire, Version 3) Suppose (X, d) is a complete metric 
space and B;, i € N, is a countable family of closed subsets of X such that 


X = UX, B. (C.2) 
Then at least one of the sets B; has a nonempty interior. 


Proof If all the closed sets B; had an empty interior, then A; = X\B;,i € N, 
would be a countable family of open and dense subsets of X. The second version of 
Baire’s theorem implies that A = M;-nA; is dense in X, thus AC = X\A # X,a 
contradiction since A° = U;-nB;. Therefore, at least one of the sets B; must have a 
nonempty interior. 


Definition C.1 A topological space X in which the third version of Baire’s theorem 
holds, is called a Baire space. 

Thus a Baire space X can be exhausted by a countable family of closed subsets 
B; only when at least one of the subsets B; has a nonempty interior. Then the third 
version of Baire’s theorem can be restated as saying that all complete metric spaces are 
Baire spaces. It follows immediately that all complete metrizable Hausdorff locally 


Appendix C The Theorem of Baire 583 


convex topological vector spaces are Baire spaces. In particular, every Banach space 
is a Baire space. The spaces of functions Dx (§2) and S(R") which play a fundamental 
role in the theory of distributions are Baire spaces. 


C.1 The Uniform Boundedness Principle 


The results of Baire and Banach—Steinhaus have found many very important appli- 
cations in functional analysis. The most prominent one is the uniform boundedness 
principle which we are going to discuss in this section. It has been used in the text 
for many important conclusions. 


Definition C.2 Suppose (X, P) and (Y, Q) are two Hausdorff locally convex topo- 
logical vector spaces over the field K. Denote the set of linear functions T : X — Y 
with L(X,Y). A subset A C L(X,Y) is called 


a) pointwise bounded if, and only if, for every x € X the set {Tx : T € A} is 
bounded in (Y, Q), i.e., for every seminorm qg € Q, 


sup{q(Tx):T € A}=Cyg < ~; 


b) equi-continuous if, and only if, for every seminorm q € Q there is a seminorm 
p € P anda constant C > 0 such that 


q(Tx) < Cp(x) Vx EX, VT EA. 


Obviously, the elements of an equi-continuous family of linear mappings are con- 
tinuous and such a family is pointwise bounded. For an important class of spaces 
(X, P) the converse holds too, i.e., a pointwise bounded family of continuous linear 
mappings A C L(X, Y) is equi-continuous. 


Theorem C.4 (Theorem of Banach-Steinhaus) Assume that two Hausdorff lo- 
cally convex topological vector spaces over the field IK, (X,P) and (Y, Q), are given 
and assume that (X ,P) is a Baire space. Then every bounded family A of continuous 
linear mappings T : X — Y is equi-continuous. 


Proof For an arbitrary seminorm q € Q introduce the sets 
Ur, ={x eX: qg(Tx) < I}, TeA. 


Since T is a continuous linear map, the set Ur, is a closed absolutely convex subset 
of (X, P). Hence 
U,=() Ure 


TEA 


is closed and absolutely convex too. 
Now given a point x € X the family A is bounded in this point, i.e., for every 
q € Q there is a Cy, < oo such that g(Tx) < Cy, for all T € A. Choose n € N, 


584 Appendix C The Theorem of Baire 
n = Cy; then for all T € A we find q(T(x/n)) = 1q(Tx) < 1Cyg < 1, hence 
ty € U, or x € nUg. Clearly, with U, also the set nU, is closed (and absolutely 
convex); thus X is represented as the countable union of the closed sets nU,: 


x= | | ne, 


neN 


Since (X,P) is assumed to be a Baire space, at least one of the sets nU, must 
have a nonempty interior, hence U, has a nonempty interior, i.e., there are a point 
Xo € U,,seminorms p;,..., py € P and positive numbersr),... ,7y suchthat V = 
Mi Bp, (x0, 7) C U,.Nowchoose p = max{pj,... , Pn}andr = min{rj,... , ry}. 
We have p € P,r > 0, and B,(xo,r) C V C Uj. The definition of U, implies 
q(Tx) < 1 for all x € B,(xo,r) and all T € A. Hence for every € € B,(0,r) and 
every T € A: qg(Té) < q(T(xo +€)) + g(Tx0) < 1+¢(T x0) = C. Lemma 2.2 now 
implies 


Cc 
q(Tx) < 7 Pix) Vxex, VTEA 


which proves that the family A is equi-continuous. 
The Banach-Steinhaus theorem has many applications in functional analysis. We 
mention some of them which are used in our text. They are just special cases for the 
choice of the domain space (X, P) which has to be a Baire space. 
Every Banach space X is a Baire space. Therefore Theorem C.4 applies. Given a 
family {T, : a € A} of continuous linear maps Ty, : X — K which is pointwise or 
weakly bounded, then, for every x € X, there is a constant C, such that 


sup{|Ty(x)|: a € A} < Cy < @. 


According to the Banach-Steinhaus theorem such a family is equi-continuous, i.e., 
there is a constant C < oo such that 


ITa(x)| = Cllx|| WxeXx, Vaea 


and therefore 
sup || Tall < C. 


acA 
Hence the family {T, : a € A} is not only pointwise bounded, it is uniformly or 
norm bounded: This is the uniform boundedness principle in Banach spaces, see 
also Theorem C.4. 

Earlier in this appendix we had argued that the spaces Dx (92), 2 CR", K CQ 
compact, are Baire spaces. Thus the Banach-Steinhaus theorem applies to them. 
Suppose {T, : a € A} C D;,(82) is a pointwise bounded family of continuous 
linear forms on Dx (£2), i.e., for every f € Dx(S2) there is a Cf < 00 such that 
|Ta(f)| < Cy for alla € A. Theorem C.4 implies that this family is equi-continuous, 
i.e., there is an m € N and aconstant C such that 


Ta P| < Caxm(f) Wf EDk(2), Voed. 


Now we come back to the problem of continuity of the pointwise limit of contin- 
uous functions which were the starting point of Baire’s investigations. We consider 


Appendix C The Theorem of Baire 585 


continuous linear functions on Hausdorff locally convex topological vector spaces. 
For the case of continuous nonlinear functions on finite dimensional spaces we refer 
to the Exercises (this case is more involved). 


Theorem C.5 Suppose (T;)jen is a sequence of continuous linear functionals on 
a Hausdorff topological vector space (X,P) with the property that for every x € X 
the numerical sequence (Tj(X)) jen is a Cauchy sequence. Then: 


a) A linear functional T is well defined on X by 


T(x) = lim T;(x) VxeXx. 
jroo 
b) If (X,P) is a Baire space, then the functional T defined in a) is continuous. 


Proof Since the field K is complete, the Cauchy sequence (7j(x)) jen converges in 
K. We call its limit T(x). Thus a function T : X — K is well defined. Basic rules of 
calculations for limits now prove linearity of this function T. 

Cauchy sequences in the field IK are bounded, hence, for every x € X there is a 
finite constant C, such that sup{|7j(x)| : j ¢ N} < C,. The theorem of Banach— 
Steinhaus implies that this sequence is equi-continuous, i.e., there is some p € P 
and there is a finite constant C such that |7j(x)| < Cp(x) forall x € X andall j € N. 
Taking the limit j — oo in this estimate we get |T(x)| < Cp(x) for all x € X and 
thus T is continuous. 

Part b) of this theorem is often formulated in the following way. 


Corollary C.1 The topological dual space X' of a Hausdorff locally convex Baire 
space (X,P) is weakly sequentially complete. 
And as a special case of this result we have: 


Corollary C.2. The spaces of distributions D'(2), 82 Cc R" open and nonempty, 
and S'(R") are weakly sequentially complete. 


Proof The main point of the proof is to establish that the spaces Dx(), K C 92 
compact, and S(R”) are complete metrizable and thus Baire spaces. But this has 
already been done. 

Finally we use Baire’s theorem to show in a relatively simple way that the test 
function spaces D(§2), S2 C R" open and nonempty, are not metrizable. 

To this end we recall that the spaces Dx(92), K C 92 compact, are closed in 
D(&2). Furthermore there is a sequence of compact sets K; C Kj+1 for all j ¢ N 
such that 2 = UjenK;. It follows that 


D(2) = UjenDx (2). 


If D(2) were metrizable, then according to the third version of Baire’s theorem one 
of the spaces Dx ,(§2) must have a nonempty interior which obviously is not the case 
(to show this is a recommended exercise). 


Proposition C.1 The test function spaces D(S2), 82 C R" open and nonempty, are 
complete nonmetrizable Hausdorff locally convex topological vector spaces. 


586 Appendix C The Theorem of Baire 


Proof In the book [1] one finds a proof of this result which does not use Baire’s 
theorem (see Theorem 28, page 71). 


C.2 The Open Mapping Theorem 


This section introduces other frequently used consequences of Baire’s results. These 
consequences are the open mapping theorem and its immediate corollary, the inverse 
mapping theorem. 


Definition C.3 A mapping T : E — F between two topological spaces is called 
open if, and only if, T(V) is open in F for every open set V C E. 

Our main interest here are linear open mappings between Banach spaces. Thus 
the following characterization of these mappings is very useful. 


Lemma C.1 A linear map T : E — F between two normed spaces E, F is open 
if, and only if, 


Jr>0: BY cT(Bf) (C.3) 


where B# is the open ball in E with radius | and center 0 and BF the open ball in 
F with radius r > 0 and center 0. 


Proof If T is an open mapping, then 7(BF) is an open set in F which contains 
0 € F since T is linear. Hence there is anr > 0 such that relation (C.3) holds. 
Conversely assume that relation (C.3) holds and that V C E is open. Choose 
any y = Tx € T(V). Since V is open there is a p > O such that x + BE Cc Vz. 
It follows that y + T(B”) = T(x + B®) C T(V). Relation (C.3) implies that 
Bi. = pB? C pT(Bf) = T(BF) and thus y + BY. C T(V). Therefore y is an 
interior point T(V) and we conclude. 


Theorem C.6 (Open Mapping Theorem) Let EF, F be two Banach spaces and 
T : E > F asurjective continuous linear mapping. Then T is open. 


Proof Fora proof one has to show relation (C.3). This will be done in two steps. 
For simplicity of notation the open balls in EF of radius r > O and center 0 
are denoted by B,. Since obviously By/2 — Biz C By, and since T is linear we 
have T(B,;2) — T(Bi/2) C T(B)). In any topological vector space for any two 
sets A, B the relation A — B C A —B for their closures is known. This implies 
T(Bi2) — T(Bij2) C T(B)). 
Surjectivity and linearity of T give 


F = US, kT (By). 


As a Banach space, F is a Baire space and therefore at least one of the sets KT (B1/2), 
k € N, must have a nonempty interior. Since y +> ky is a surjective homeomorphism 
of F the set T(B,/2) has anonempty interior, i.e., there is some open nonempty set V 


Appendix C The Theorem of Baire 587 


in F which is contained in T(B,/2), and hence V—V C T(Bji/2)—T(By2) C T(B)). 
V — V is an open set in F which contains 0 € F. Therefore there is some r > 0 
such that BY Cc V — V and we conclude 


BY CT). (C4) 


In the second step we use relation (C.4) to deduce Relation C.3. Pick any y € V, = 
Be, then ||y||- < r and we can choose some R € (|ly|lz,7). Now rescale y to 
y’ = Ry. Clearly ||y'l|- < r and therefore y’ € V, C T(B,). Since 0 < R <1 
there is 0 < a < 1 such that & +a < l,ie., ga < 1. Since y’ belongs to 
the closure of the set T(B,) there is a yo € T(B,) such that ||y’ — yollr < ar. It 
follows that zp = 4(y' — yo) € V, and by the same reason there is a y,; € T(B,) such 
that |/Z) — yi ||- < ar, and again z; = 1(Z — y,) € V, and there is a yp € T(B,) 
such that ||z; — ya||”7 < ar. By induction this process defines a sequence of points 


yo, Y1, y2,--- in T(B,) which satisfies 


n 


y¥—doa'y; 


i=0 


<aq"tlp, n=1,2,.... (C5) 
F 


Estimate (C.5) implies y’ = ya! yi. By construction yy = T(x;) for some 
x; € By. Since ||a'x;|| < a’ for all i and since E is complete, the series y a'x; 
converges in E. Call the limit x’. A standard estimate gives 


= 1 


/ i 
Ir’ <0 =. 
E l-a 


i=0 


Continuity of T implies T(x’) = )°?°, a! T(x:) = y’ and if we introduce x = Ry! 
we get T(x) = Ry! = y. By choice of the parameter a the limit x actually belongs 
to B,. This follows from ||x||_; = a |x’ I. < a < 1. We conclude that y € V, 
is the image under T of a point in B,. Since y was any point in V. this completes the 


proof. 


Corollary C.3 (Inverse Mapping Theorem) A continuous linear map T from a 
Banach space E onto a Banach space F which is injective has a continuous inverse 
T~! : F > E and there are positive numbers r and R such that 


r|lxlle SITxlpsRilelle Vxek. 


Proof Such a map is open and thus T satisfies relation (C.3) which implies im- 
mediately that the inverse T~' is bounded on the unit ball B/ by +, hence T~! is 
continuous and its norm is < i The two inequalities just express continuity of T 
(upper bound) and of T~! (lower bound). 


588 Appendix C The Theorem of Baire 


If E, F are two Banach spaces over the same field, then E x F is a Banach space 
too when the vector space E x F is equipped with the norm 


Ix, yl = llelle + Ilylle- 


The proof is a straightforward exercise. If T : E — F is a linear mapping, then its 


graph 
G(T) ={@, y)e Ex F:y=Tx} 


is a linear subspace of E x F. If the graph G(T) of a linear mapping T is closed in 
E x F the mapping T is called closed. Recall that closed linear mappings or operators 
have been studied in some detail in the context of Hilbert spaces (Sect. 19.2). 


Theorem C.7 (Closed Graph Theorem) /fT : E — F is a linear mapping from 
the Banach space E into the Banach space F whose graph G(T) is closed in E x F, 
then T is continuous. 


Proof As aclosed subspace of the Banach space FE x F the graph G(T) is a Banach 
space too, under the restriction of the norm ||-|| to it. Define the standard projection 
mappings p : G(T) — E andq : G(T) — F by p(s,y) = x, respectively 
q(x, y) = y. Since G(T) is the graph of a linear mapping, both p and gq are linear 
and p is injective. By definition, p is surjective too. Continuity of p and q follow 
easily from the definition of the norm on G(T): || p(x, W|lz = WIxlle < lx lle t+llyll- 
and similarly for g. Hence p is a bijective continuous linear map of the Banach 
space G(T) onto the Banach space F and as such has a continuous inverse, by the 
inverse mapping theorem. Thus T is represented as the composition g o p~! of two 
continuous linear mappings, T(x) = g o p~'(x), for all x € E, and therefore T is 
continuous. 


Appendix D 
Bilinear Functionals 


A functional of two variables from two vector spaces is called bilinear if, and only 
if, the functional is linear in one variable while the other variable is kept fixed. For 
such functionals there are two basic types of continuity. The functional is continuous 
with respect to one variable while the other is kept fixed, and the functional is con- 
tinuous with respect to simultaneous change of both variables. Here we investigate 
the important question for which Hausdorff locally convex topological vector spaces 
both concepts of continuity agree. 


Definition D.1 Let (X, P) and (Y, Q) be two Hausdorff locally convex vector spaces 
over the field K and B : X x Y — Ka bilinear functional. B is called 


a) separately continuous if, and only if, for every x € X there are a constant 
C,, and a seminorm g, € Q such that | B(x, y)| < Cyqx(y) for all y € Y, and 
for every y € Y there are a constant C, and a seminorm p, € P such that 
|B(x, y)| < Cy py(x) for all x € X. 

b) continuous if, and only if, there are a constant C and seminorms p € P and 
q € Qsuch that | B(x, y)| < Cp(x)q(y) for all x € X andally € Y. 


Obviously, every continuous bilinear functional is separately continuous. The con- 
verse statement does not hold in general. However for a special but very important 
class of Hausdorff topological vector spaces one can show that separately continuous 
bilinear functionals are continuous. 


Theorem D.1 Suppose that (X,P) and (Y, Q) are two Hausdorff locally convex 
metrizable topological vector spaces and assume that (XP) is complete. Then every 
separately continuous bilinear functional B : X x Y — K is continuous. 


Proof For metrizable Hausdorff locally convex topological vector spaces continuity 
and sequential continuity are equivalent. Thus we can prove continuity of B by 
showing that B(x;, y;) > B(x, y) whenever x; — x and y; > yas j > ov. 
Suppose such sequences (x ;) jen and (y;) jen are given. Define a sequence of linear 
functionals T; : X — K by T;(x) = B(x, y;) for all j € N. Since B is separately 
continuous all the functionals 7; are continuous linear functionals on (X, P). Since 
the sequence (y;) jen converges in (Y, Q) we know that C,, = SUP jen 4x (y;) is finite 


© Springer International Publishing Switzerland 2015 589 
P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2 


590 Appendix D Bilinear Functionals 
for every fixed x € X. Hence separate continuity implies 


sup |T}(x)| < Cy Cg,x 

JEN 
where the constant C, refers to the constant in the definition of separate continu- 
ity. This shows that (7;) jen is a point-wise bounded sequence of continuous linear 
functionals on the complete metrizable Hausdorff locally convex topological vec- 
tor space (X,P). The Theorem of Banach—Steinhaus implies that this sequence is 


equi-continuous. Hence there are a constant C and a seminorm p ¢€ P such that 
|T;(x)| < Cp(x) for all x ¢ X. This gives 


|B(x;,y;)— Bx, y)| < |Tj(xj; —x)| + |B, yj; — y)| < Cp(x; — x) 
+|Ba,yj-y)| > 0 


as j — oo. Therefore B is sequential continuous and thus continuous. 

An application which is of interest in connection with the definition of the tensor 
product of distributions (see Sect. 6.2) is the following. Suppose §2; C R”/ are open 
and nonempty subsets and K; C 2; are compact, j = 1,2. Then the spaces Dx ;({2;) 
are complete metrizable Hausdorff locally convex topological vector spaces. Thus 
every separately continuous bilinear functional Dx,(§2;) x Dx,(&2) > K is 
continuous. 


Reference 


1. Kirilov AA, Gvisbiani AD. Theorems and problems in functional analysis. New York: Springer- 
Verlag; 1982. 


Index 


A 
adjoint, 45 
adjoint operator, 280 
almost periodic functions, 253 
angle 
between two vectors, 217 


B 
Baire space, 582 
Banach 
reflexive 
examples of, 513 
Banach algebra 
involutive or *, 455 
bilinear form, 589 
continuous, 589 
separately continuous, 589 
boundary value, 164 
bounded 
linear function, 17 
pointwise, 271, 583 
uniformly or norm, 271 
Breit—Wigner formula, 35 


Cc 

C*-algebra, 456 

calculus of variations, 503 
Carathéodory functions, 554 
carrier, 173 


Cauch—Riemann equations, 131 


Cauchy estimates, 125 
Cauchy sequence 


Cauchy’s principal value, 31 
chain rule, 53 
change of variables, 53 
Choi matrix, 494 
Choi-Jamiolkowski isomorphism, 496 
class C”, 522 
coercive, 515 
commutation relations 

of Heisenberg, 312 
compact 

locally, 266 

relatively, 513 

sequentially, 511 
compact operators 

dual space of, 374 
complete, 218, 241 

sequentially, 16 
completeness relation, 242 
completion 

of a normed space, 577 

of an inner product space, 577 
constraint, 537 
continuous 

at xo, 16 

on X, 16 
convergence 

of sequences of distributions, 33 

of series of distributions, 38 
convex, 528 
convex minimization, 514 
convolution 

- product of functions, 85 


in D'(2), 33 equation, 101 
Cauchy’s integral formula I, 123 in S(R"), 87 
Cauchy’s integral formula II, 123 of distributions, 95 
© Springer International Publishing Switzerland 2015 591 


P. Blanchard, E. Briining, Mathematical Methods in Physics, 
Progress in Mathematical Physics 69, DOI 10.1007/978-3-319-14045-2 


592 


core 
of a linear operator, 287 
of a quadratic form, 296 
cyclic vector, 456 
of a self-adjoint operator, 447 


D 
delta function 

Dirac’s, 3 
delta sequences, 33 
density matrix 

or statistical operator, 388 
derivative, 520 

of a distribution, 46 

weak or distributional, 63 
Dido, 504 
Dido’s problem, 543 
differentiable 

twice, 522 
direct method 

of the calculus of variations, 506 
direct orthogonal sum, 229 
direct sum 

of Hilbert spaces, 256 
Dirichlet 

boundary condition, 505 

boundary conditions, 552 

Laplacian, 553 
Dirichlet form, 296 
Dirichlet integral, 505 
Dirichlet problem, 504, 552 
distribution 

Dirac’s delta, 31 

local order, 28 

order, 28 

periodic, 54 

regular, 5 

tempered, 40 

with support in xo, 56 
distributions, 27 

of compact support, 27, 41 

regular, 29, 30 

tempered, 27 
divergence type, 558 
domain 

of a linear operator, 277 
dual 

algebraic, 9 

topological, 17, 25 


E 
effects, 483 
eigenfunction 


generalized, 449 
completeness of, 450, 452 

eigenvalue problem 

nonlinear, 558 
elementary solution 

advanced resp. retarded, 115 
equi-continuous, 583 
ergodic, 336 
Euclidean spaces, 221 
Euler, 503 
Euler-Lagrange 

necessary condition of, 526 
Euler-Lagrange equation, 506, 534 
existence of a minimum, 512 
extension 

of a linear operator, 279 

of a quadratic form, 296 


F 
Fock space 
Boson, 262 
Fermion, 262 
form domain, 296 
form norm, 296 
form sum, 303 
Fourier 
hyperfunction, 174 
transform, 134 
transformation, 134 
inverse, 140 
on L?(R"), 152 
on S(R"), 142 
on S’(R"), 144 
Fourier expansion, 242 
Fréchet differentiable, 520 
Friedrich’s extension, 302 
Fubini’s theorem 
for distributions, 82 
function 
Heaviside, 47 
strongly decreasing, 20 
functional, 504 
positive, 460 
analytic, 173 
positive 
examples, 462 
functional calculus 
of self-adjoint operators, 419 
fundamental theorem of algebra, 125 


fundamental theorem of calculus, 524 


G 
GAteaux derivative, 531 
GAteaux differential, 530 


Index 


Index 593 


Gagliardo-Nirenberg-Sobolev inequality, 192 integral of functions, 523 


Gelfand triple, 441 integral operators 
generalized functions, 5 of Hilbert—Schmidt, 310 

Gelfand type S?, 172 isometry, 329 
Gram determinant, 232, 233 isoperimetric problem, 504 
Gram-—Schmidt orthonormalization, 241 
Green’s function, 108 K 

Kato perturbation, 337 

H of free Hamiltonian, 340 


Knotensatz, 248 


Hadamard’s principal value, 32 ; ae 
Kolmogorov-Riesz compactness criterion, 196 


Hamilton operator 
free, 291 
heat equation, 112, 158 
Heisenberg’s uncertainty relation, 390 
Helmholtz differential operator, 156 
Hermite 
functions, 250 
polynomials, 249 
Hilbert cube, 358 
Hilbert space, 219 
rigged, 441 
separable, 239, 240 
Hilbert space basis, 240 
Hilbert spaces 
direct integral of, 446 
measurable field of, 446 


L 
Lagrange multiplier, 537 
Lagrange multipliers 
existence of, 541 
Laguerre 
functions, 250 
polynomials, 250 
Laplace operator, 156 
fundamental solution, 111 
Laplace transform 
of distributions, 168 
of functions, 168 
Laurent expansion, 127 
Lebesgue space, 223 


Legendre 
Hilbert sum 1 ials, 251 
direct sum of Hilbert spaces, 256 ae eer i 
or direc Pp > Leibniz, 503 


Hilbert tr ansfor m, 99 Leibniz formula, 51 
Hilbert-Schmidt norm, 365 iétama 
Hilbert-Schmidt operator, 365 of Wielandt, 312 


spectral representation, 371 of Riemann—Lebesgue, 135 


hictvs of Riesz, 361 
metrizable, 579 of Weyl, 108 

holomorphic, 121 length, 215 

homomorphism level surface, 538 


of *-algebras, 456 
hyperfunctions, 173 


linear functional 


oie positive 
hypo-elliptic, 109 completely additive, 466 
hypo-ellipticity normal, 466 
of 8, 120 linear hull 
or span, 228 
I linear map 
inequality k-positive, 473 
Bessel’s, 216 completely positive, 473 
Cauchy—Schwarz—Bunjakowski, 217 examples, 474 
Schwarz’, 216 in quantum physics, 486 
triangle, 217 on B(H), 480 
infinitesimal generator, 331 positive, 472 
inner product space, 214 linear operator 
integral bounded, 279, 307 
with respect to a spectral family, 406 closable, 284 


integral equation, 208 closed, 283 


594 


closure of, 284 
core of, 287 
essentially self-adjoint, 286, 288 
from H into K, 277 
of multiplication, 289 
positive, 320 
product or composition, 280 
self-adjoint, 286, 287 
symmetric, 286 
unbounded, 279, 307 
local extrema 
necessary and sufficient conditions for, 527 
locally integrable, 14 
lower semi-continuous, 511 


M 
matrix spaces, 221 
maximal, 241 
maximal self-adjoint part 
of A, 415 
Maxwell’s equations 
in vacuum, 116 
metric, 17 
metric space, 575 
completion, 575 
metrizable 
HLCTVS, 18 
minimax principle, 550 
minimization 
constrained, 537 
minimizer 
existence, 512 
uniqueness, 512 
minimizing point, 529 
minimizing sequence, 506 
minimum (maximum) 
global, 526 
local, 526 
modulus 
of an operator, 321 
momentum operator, 290 
monotone, 528 
Morrey’s inequality, 187 
muliplicator space, 52 
multiplication operator 
bounded, 309 
unbounded, 310 


N 
Neumann series, 319 
Newton, 503 
norm, 8 
Hilbertian, 220 


induced by an inner product, 217 
of a bounded linear operator, 307 
norm topology, 218 
normal states 
characterization, 467 
nuclear, 441 


O 
ONB 
characterization of, 241 
open ball, 575 
operator 
compact, 355 
completely continuous, 355 
inverse, 278 
nuclear, 373 
of finite rank, 356 
order 
in a *-algebra, 460 
order symbol, 520 
orthogonal, 215 
complement M+, 227 
orthonormal, 215 
basis, 240 
polynomials, 247 
system, 215 


P 
parallelogram law, 220 
Parseval relation, 242 
partial differential operator 
linear constant coefficients, 48 
linear elliptic, 554 
partial trace, 382 
characterization, 385 
Pauli matrices, 263 
Poisson equation, 112 
polar decomposition, 321 
polarization identity, 220 
pole 
of finite order, 129 
positive element 
in a *-algebra, 460 
positive elements 
in B(H), 460 
pre-Hilbert space, 214 
product 
inner or scalar, 214 
of distributions and functions, 50 
rule, 50 
projection theorem 
for closed subspaces, 229 
for convex sets, 547 
projector 


Index 


Index 


or projection operator, 325 
pseudo function, 32 
Pythagoras 

theorem of, 215 
p-ball 

open, 10 


Q 
quadratic form, 295 
closable, 296 
closed, 296 
continuous, 295 
densely defined, 295 
first representation theorem, 299 
lower bound, 295 
minimization, 515 
positive, 295 
second representation theorem, 301 
semi-bounded 
from below, 295 
symmetric, 295 
quadratic functional, 515 
quantum operation, 486 
Kraus form, 491 


R 
Radon measure, 67 
regular (critical) point, 526 
regular points 

of an operator A, 344 
regularization 

of distributions, 89 
regularizing sequence, 91 
renormalization, 57 
representation 

cyclic, 456 

of a C*-algebra, 456 
residue, 129 
resolution of the identity, 327, 402 
resolvent, 344 

identity, 346 

set, 344 
restriction 

of a linear operator, 279 
Riemann integral, 341 
Rodrigues’ formula, 250 


Ss 
Schmidt decomposition, 386 
Schrédinger operator 

free, 158 
Schwartz distributions, 169 
self-adjoint 

geometric characterization, 400 


595 


semi-metric, 17 
semi-norm, 8 

comparable, 10 

smaller than, 10 
semi-norms 

system of, 11 

filtering, 11 
systems of 
equivalent, 12 

sequence 

Cauchy, 15, 218 

converges, 15, 218 
sequence space, 221 
set 

open, 7 
singularity 

essential, 129 

isolated, 129 

removable, 129 
Sobolev 

constant, 552 

embeddings, 183, 551, 559 

inequality, 551 

space, 551 

space of order (k, p), 182 
Sobolev spaces, 223 
Sokhotski—Plemelji formula, 35 
solution 

classical or strong, 48, 107 

distributional or weak, 48, 107 

fundamental, 104, 108 
spectral family, 402 
spectrum, 344 

continuous, 348 

discrete, 348 

essential, 427 

point, 348 
square root lemma, 320 
state 

bound, 431 

completely additive, 467 

normal, 467 

of a *-algebra with unit, 465 

scattering, 431 
Stone’s formula, 436 
Sturm-Liouville problem, 561 
subset 

meager, 581 

nonmeager, 581 

nowhere dense, 581 

of first category, 581 

of second category, 581 
subspace 


596 


absolutely continuous, 424 
invariant, 415 
reducing, 415 
singular, 424 
singularly continuous, 423 
support 
of a distribution, 39 
of a spectral family, 402 
of an analytic functional, 173 
of Fourier hyperfunctions, 175 
singular, 40 
support condition, 94 


T 
tangent space 
existence of, 539 
Taylor expansion with remainder, 524 
tensor product 
for distributions, 81 
of functions, 73 
of Hilbert spaces, 259 
totally anti-symmetric, 262 
totally symmetric, 262 
of vector spaces, 258, 259 
projective 
of E, F,77 
of p,q, 76 
test function space 
D(2), 19 
E(2), 21 
S(2), 21 
theorem 
Baire, version 1, 581 
Baire, version 2, 582 
Baire, version 3, 582 
Banach-Saks, 273 
Banach-Steinhaus, 271, 583 
Busch-Gleason, 485 
Cauchy, 122 
Choi, 494 
closed graph, 588 
convolution, 149 
generalized, 150 
de Figueiredo—Karlovitz, 221 
ergodic - von Neumann, 335 
extension of linear functionals, 235 
Fréchet—von Neumann—Jordan, 220 
Fredholm alternative, 362 
Gleason, 483 
GNS construction, 462 
Ho6rmander, 154 
Hellinger—Toeplitz, 309 
Hilbert—Schmidt, 360 


identity of holomorphic functions, 126 
inverse mapping, 587 
Kakutani, 220 
Kato-Rellich, 338 
Liouville, 125 
Naimark, 459 
of F. Riesz, 266 
of residues, 130 
of Vigier, 466 
open mapping, 586 
Plancherel, 152 
Rellich-Kondrachov compactness, 196 
Riesz—Fischer, 223 
Riesz—Fréchet, 234 
Riesz—Schauder, 359 
Sobolev embedding, 193 
spectral, 410 
functional form, 447 
nuclear, 452 
Stinespring factorization, 475 
Stone, 331 
Weyl, 428 
topological complement, 541 
topological space, 7 
Hausdorff, 12, 13 
topology, 7 
o-strong, 378 
o-strong*, 378 
o-weak, 378 
defined by semi-norms, 11 
of uniform convergence, 16 
strong, 377 
strong*, 377 
ultrastrong, 378 
ultrawak, 378 
weak, 378 
topology on B(H) 
norm or uniform, 317 
strong, 317 
weak, 317 
total subset, 231 
trace, 370 
trace class operator, 365 
spectral representation, 371 
trace class operators 
dual space of, 375 
trace norm, 365 


U 

ultradifferentiable functions, 177 
ultradifferential operator, 178 
ultradistributions, 177 

uniform boundedness principle, 271, 584 
unitary operator, 330 


Index 


Index 


unitary operators 
n-parameter group, 333 
one-parameter group, 330 

upper semi-continuity, 511 


Vv 

variation 
nth, 533 

vector space 
locally convex topological, 7 
topological, 7 

vector state, 465 

von Neumann algebra, 465 


Ww 


wave operator, 114 


weak 
Cauchy sequence, 267 
convergence, 267 
limit, 267 
topology, 267 
weak topology 
D'(), 33 
Weierstra8 theorem 
Generalized II, 515 
Weierstrai theorems 
generalized, 508 
Weierstrass theorem 
Generalized I, 514 
Weyl’s criterion, 347 
Wiener—Hopf operators, 312 


